This implements the logic of the `common_type` base template as a
builtin alias. If there should be no `type` member, an empty class is
returned. Otherwise a specialization of a `type_identity`-like class is
returned. The base template (i.e. `std::common_type`) as well as the
empty class and `type_identity`-like struct are given as arguments to
the builtin.
OpenCL has a reserved operator (^^), the use of which was diagnosed as
an error (735c6cdebdcd4292928079cb18a90f0dd5cd65fb). However, OpenCL
also encourages working with the blocks language extension. This token
has a parsing ambiguity as a result. Consider:
unsigned x=0;
unsigned y=x^^{return 0;}();
This should result in y holding the value zero (0^0) through an
immediately invoked block call as the right-hand side of the xor
operator. However, it causes errors instead because of this reserved
token: https://godbolt.org/z/navf7jTv1
This token is still reserved in OpenCL 3.0, so we still wish to issue a
diagnostic for its use. However, we do not need to create a token for an
extension point that's been unused for about a decade. So this patch
moves the diagnostic from a parsing diagnostic to a lexing diagnostic
and no longer forms a single token. The diagnostic behavior is slightly
worse as a result, but still seems acceptable.
Part of the reason this is coming up is because WG21 is considering
using ^^ as a token for reflection, so this token may come back in the
future.
This patch stops adjustments of the module cache path beyond what is
done in `ParseHeaderSearchArgs` (making it absolute and removing dots).
This enables more efficient implementation of the caching VFS in
https://github.com/llvm/llvm-project/pull/88800.
The utilities we use for lexing string and character literals can be run
in a mode where we pass a null pointer for the diagnostics engine. This
mode is used by the format string checkers, for example. However, there
were two places that failed to account for a null diagnostic engine
pointer: `\o{}` and `\x{}`.
This patch adds a check for a null pointer and correctly handles
fallback behavior.
Fixes#102218
Consider the following:
```
template<typename T>
struct A { };
template<typename T>
int A<T>::B::* f(); // error: no member named 'B' in 'A<T>'
```
Although this is clearly valid, clang rejects it because the
_nested-name-specifier_ `A<T>::` is parsed as-if it was declarative,
meaning, we parse it as-if it was the _nested-name-specifier_ in a
redeclaration/specialization. However, we don't (and can't) know whether
the _nested-name-specifier_ is declarative until we see the '`*`' token,
but at that point we have already complained that `A` has no member
named `B`! This patch addresses this bug by adding support for _fully_
unannotated _and_ unbounded tentative parsing, which allows for us to
parse past tokens without having to cache them until we reach a point
where we can guarantee to be past the construct we are disambiguating.
I don't know where the approach taken here is ideal -- alternatives are
welcome. However, the performance impact (as measured by
llvm-compile-time-tracker (https://llvm-compile-time-tracker.com/?config=Overview&stat=instructions%3Au&remote=sdkrystian)
is quite minimal (0.09%, which I plan to further improve).
The introduction of `std::put_time` in
fad17b43dbc09ac7e0a95535459845f72c2b739a broke the Solaris build:
```
Undefined first referenced
symbol in file
_ZNKSt8time_putIcSt19ostreambuf_iteratorIcSt11char_traitsIcEEE3putES3_RSt8ios_basecPKSt2tmPKcSB_ lib/libclangLex.a(PPMacroExpansion.cpp.o)
ld: fatal: symbol referencing errors
```
As discussed in PR #99075, the problem is that GCC mangles `std::tm` as
`tm` on Solaris for binary compatility, while `clang` doesn't (Issue
#33114).
As a stop-gap measure, this patch introduces an `asm` level alias to the
same effect.
Tested on `sparcv9-sun-solaris2.11`, `amd64-pc-solaris2.11`, and
`x86_64-pc-linux-gnu`.
In `clang/lib/Lex/PPMacroExpansion.cpp`, replaced the usage of the
obsolete `asctime` function with `std::put_time` for generating
timestamp strings.
Fixes: https://github.com/llvm/llvm-project/issues/98724
Such searches can be costly and non-intuitive. We've seen complaints
from developers that they don't expect clang to find modules on their
own and not in search paths that developers provide. Keeping the search
of modulemaps in subdirectories for code completion as it provides
better user experience.
If you are defining module "UsefulCode" in
"include/UnrelatedName/module.modulemap", it is recommended to rename
the directory "UnrelatedName" to "UsefulCode". If you cannot do so, you
can add to "include/module.modulemap" a line like `extern module
UsefulCode "UnrelatedName/module.modulemap"`, so clang can find module
"UsefulCode" without checking each subdirectory in "include/".
rdar://106677321
---------
Co-authored-by: Jan Svoboda <jan@svoboda.ai>
Follow-up to `34ab855826b8cb0c3b46c770b83390bd1fe95c64`:
* Don't fail the scan with an include with missing filename, it may be
inside a skipped preprocessor block. Let the compilation provide any
related error.
* Fix an issue where the lexer was skipping through the next directive,
after ignoring the include with missing filename.
Previously source input like `#import ` resulted in infinite calls
append the same token into `CurDirTokens`. This patch now ignores those
directive lines if they won't actually end up being compiled. (e.g.
macro guarded)
resolves: rdar://121247565
This PR implement [P3034R1 Module Declarations Shouldn’t be
Macros](https://wg21.link/P3034R1), and refactor the convoluted state
machines in module name lexical analysis.
---------
Signed-off-by: yronglin <yronglin777@gmail.com>
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
Co-authored-by: cor3ntin <corentinjabot@gmail.com>
1. Dead code in `LookupEmbedFile`. The loop always exited on the first
iteration. This was also causing a bug of not checking all directories
provided by `--embed-dir`.
2. Use of uninitialized variable `CurTok` in `LexEmbedParameters`. It
was used to initialize the field which seems to be unused. Removed
unused field, this way `CurTok` should be initialized by Lex method.
`import:(type)name` is a method argument decl in ObjC, but the C++20
preprocessing rules say this is a preprocessing line.
Because the dependency directive scanner is not language dependent, this
patch extends the C++20 rule to exclude `module :` (which is never a
valid module decl anyway), and `import :` that is not followed by an
identifier.
This is ok to do because in C++20 mode the compiler will later error on
lines like this anyway, and the dependencies the scanner returns are
still correct.
This enables raw R"" string literals in C in some language modes
and adds an option to disable or enable them explicitly as an
extension.
Background: GCC supports raw string literals in C in `-gnuXY` modes
starting with gnu99. This pr both enables raw string literals in gnu99
mode and later in C and adds an `-f[no-]raw-string-literals` flag to override
this behaviour. The decision not to enable raw string literals in gnu89
mode, according to the GCC devs, is intentional as that mode is supposed
to be used for ‘old code’ that they don’t want to break; we’ve decided to
match GCC’s behaviour here as well.
The `-fraw-string-literals` flag can additionally be used to enable raw string
literals in modes where they aren’t enabled by default (such as c99—as
opposed to gnu99—or even e.g. C++03); conversely, the negated flag can
be used to disable them in any gnuXY modes that *do* provide them by
default, or to override a previous flag. However, we do *not* support
disabling raw string literals (or indeed either of these two options) in
C++11 mode and later, because we don’t want to just start supporting
disabling features that are actually part of the language in the general case.
This fixes#85703.
This commit implements the entirety of the now-accepted [N3017
-Preprocessor
Embed](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm) and
its sister C++ paper [p1967](https://wg21.link/p1967). It implements
everything in the specification, and includes an implementation that
drastically improves the time it takes to embed data in specific
scenarios (the initialization of character type arrays). The mechanisms
used to do this are used under the "as-if" rule, and in general when the
system cannot detect it is initializing an array object in a variable
declaration, will generate EmbedExpr AST node which will be expanded by
AST consumers (CodeGen or constant expression evaluators) or expand
embed directive as a comma expression.
This reverts commit
682d461d5a.
---------
Co-authored-by: The Phantom Derpstorm <phdofthehouse@gmail.com>
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
Co-authored-by: cor3ntin <corentinjabot@gmail.com>
Co-authored-by: H. Vetinari <h.vetinari@gmx.com>
This patch is helpful to reduce 32 bits for HeaderFileInfo by combining
a uint32_t and pointer into a tagged pointer.
This is reviewed as part of
https://github.com/llvm/llvm-project/pull/92085 and required to be split
as a separate commit
HeaderSearch::MarkFileModuleHeader is no longer properly checking for
no-changes, and so sets the HeaderFileInfo for every `textual header` to
non-external.
The commit adds serialization and de-serialization implementations for
the stored regions. Basically, the serialized representation of the
regions of a PP is a (ordered) sequence of source location encodings.
For de-serialization, regions from loaded files are stored by their ASTs.
When later one queries if a loaded location L is in an opt-out
region, PP looks up the regions of the loaded AST where L is at.
(Background if helps: a pair of `#pragma clang unsafe_buffer_usage begin/end` pragmas marks a
warning-opt-out region. The begin and end locations (opt-out regions)
are stored in preprocessor instances (PP) and will be queried by the
`-Wunsafe-buffer-usage` analyzer.)
The reported issue at upstream: https://github.com/llvm/llvm-project/issues/90501
rdar://124035402
This commit implements the entirety of the now-accepted [N3017 -
Preprocessor
Embed](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm) and
its sister C++ paper [p1967](https://wg21.link/p1967). It implements
everything in the specification, and includes an implementation that
drastically improves the time it takes to embed data in specific
scenarios (the initialization of character type arrays). The mechanisms
used to do this are used under the "as-if" rule, and in general when the
system cannot detect it is initializing an array object in a variable
declaration, will generate EmbedExpr AST node which will be expanded
by AST consumers (CodeGen or constant expression evaluators) or
expand embed directive as a comma expression.
---------
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
Co-authored-by: cor3ntin <corentinjabot@gmail.com>
Co-authored-by: H. Vetinari <h.vetinari@gmx.com>
Co-authored-by: Podchishchaeva, Mariya <mariya.podchishchaeva@intel.com>
This commit fixes https://github.com/llvm/llvm-project/issues/88896 by
passing LangOpts from the CompilerInstance to
DependencyScanningWorker so that the original LangOpts are
preserved/respected.
This makes for more accurate parsing/lexing when certain language
versions or features specific to versions are to be used.
Conversion of floating-point literal to binary representation must be
made using constant rounding mode, which can be changed using pragma
FENV_ROUND. For example, the literal "0.1F" should be representes by
either 0.099999994 or 0.100000001 depending on the rounding direction.
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.
- StringRef::operator==/!= outnumber StringRef::equals by a factor of
24 under clang/ in terms of their usage.
- The elimination of StringRef::equals brings StringRef closer to
std::string_view, which has operator== but not equals.
- S == "foo" is more readable than S.equals("foo"), especially for
!Long.Expression.equals("str") vs Long.Expression != "str".
`Module::Requirement` was defined as a `std::pair<std::string, bool>`.
This required a comment to explain what the data members mean and makes
the usage harder to understand. Replace this with a struct with two
members, `FeatureName` and `RequiredState`.
---------
Co-authored-by: cor3ntin <corentinjabot@gmail.com>
When writing out a PCM, we skip serializing headers' `HeaderFileInfo`
struct whenever this condition evaluates to `true`:
```c++
!HFI || (HFI->isModuleHeader && !HFI->isCompilingModuleHeader)
```
However, when Clang parses a module map file, each textual header gets a
`HFI` with `isModuleHeader=false`, `isTextualModuleHeader=true` and
`isCompilingModuleHeader=false`. This means the condition evaluates to
`false` even if the header was never included and the module map did not
affect the compilation. Each PCM file that happened to parse such module
map then contains a copy of the `HeaderFileInfo` struct for all textual
headers, and considers the containing module map affecting.
This patch makes it so that we skip headers that have not been included,
essentially removing the virality of textual headers when it comes to
PCM serialization.
This exposes _BitInt literal suffixes __wb and u__wb as an extension
in C++. There is a new Extension warning, and the tests are
essentially the same as the existing _BitInt literal tests for C but
with a few additional cases.
Fixes#85223
Clang uses the `HeaderFileInfo` struct to track bits of information on
header files, which gets used throughout the compiler. We also use this
to compute the set of affecting module maps in `ASTWriter` and in the
end serialize the information into the `HEADER_SEARCH_TABLE` record of a
PCM file, allowing clients to learn about headers from the module. In
doing so, Clang asks for existing `HeaderFileInfo` for all known
`FileEntries`. Note that this asks the loaded PCM files for the
information they have on each header file in question. This seems
unnecessary: we only want to serialize information on header files that
either belong to the current module or that got included textually.
Loaded PCM files can't provide us with any useful information.
For explicit modules with lazy loading (using `-fmodule-map-file=<path>`
with `-fmodule-file=<name>=<path>`) the compiler knows about header
files listed in the module map files on the command-line. This can be a
large number.
Asking for existing `HeaderFileInfo` can trigger deserialization of
`HEADER_SEARCH_TABLE` from loaded PCM files. Keys of the on-disk hash
table consist of the header file size and modification time. However,
with explicit modules Clang zeroes out the modification time. Moreover,
if you import lots of modules, some of their header files end up having
identical sizes. This means lots of hash collisions that can only be
resolved by running the serialized filename through `FileManager` and
comparing equality of the `FileEntry`. This ends up being super
expensive, essentially re-stating lots of the transitively loaded SDK
header files.
This patch cleans up the API for getting `HeaderFileInfo` and makes sure
`ASTWriter` uses the version that doesn't ask loaded PCM files for more
information. This removes the excessive stat traffic coming from
`ASTWriter` hopefully without changing observable behavior.
This reverts commit 407a2f23 which stopped propagating the callback to module compiles, effectively disabling dependency directive scanning for all modular dependencies. Also added a regression test.
Once a file has been `#import`'ed, it gets stamped as if it was `#pragma
once` and will not be re-entered, even on #include. This means that any
errant #import of a file designed to be included multiple times, such as
<assert.h>, will incorrectly mark it as include-once and break the
multiple include functionality. Normally this isn't a big problem, e.g.
<assert.h> can't have its NDEBUG mode changed after the first #import,
but it is still mostly functional. However, when clang modules are
involved, this can cause the header to be hidden entirely.
Objective-C code most often uses #import for everything, because it's
required for most Objective-C headers to prevent double inclusion and
redeclaration errors. (It's rare for Objective-C headers to use macro
guards or `#pragma once`.) The problem arises when a submodule includes
a multiple-include header. The "already included" state is global across
all modules (which is necessary so that non-modular headers don't get
compiled into multiple translation units and cause redeclaration
errors). If another module or the main file #import's the same header,
it becomes invisible from then on. If the original submodule is not
imported, the include of the header will effectively do nothing and the
header will be invisible. The only way to actually get the header's
declarations is to somehow figure out which submodule consumed the
header, and import that instead. That's basically impossible since it
depends on exactly which modules were built in which order.
#import is a poor indicator of whether a header is actually
include-once, as the #import is external to the header it applies to,
and requires that all inclusions correctly and consistently use #import
vs #include. When modules are enabled, consider a header marked
`textual` in its module as a stronger indicator of multiple-include than
#import's indication of include-once. This will allow headers like
<assert.h> to always be included when modules are enabled, even if
#import is erroneously used somewhere.
An instance of `PreprocessorOptions` is part of `CompilerInvocation`
which is supposed to be a value type. The `DependencyDirectivesForFile`
member is problematic, since it holds an owning reference of the
scanning VFS. This makes it not a true value type, and it can keep
potentially large chunk of memory (the local cache in the scanning VFS)
alive for longer than clients might expect. Let's move it into the
`Preprocessor` instead.
The `ASTWriter` algorithm for computing affecting module maps uses
`SourceManager::translateFile()` to get a `FileID` from a `FileEntry`.
This is slow (O(n)) since the function performs a linear walk over
`SLocEntries` until it finds one with a matching `FileEntry`.
This patch removes this use of `SourceManager::translateFile()` by
tracking `FileID` instead of `FileEntry` in couple of places in
`ModuleMap`, giving `ASTWriter` the desired `FileID` directly. There are
no changes required for clients that still want a `FileEntry` from
`ModuleMap`: the existing APIs internally use `SourceManager` to perform
the reverse `FileID` to `FileEntry` conversion in O(1).
This updates a few warnings that were diagnosing no arguments for a
`...` variadic macro parameter as a GNU extension when it actually is a
C++20/C23 extension now.
This fixes#84495.
On Apple platforms, some of the stddef.h types are also declared in
system headers. In particular NULL has a conflicting declaration in
<sys/_types/_null.h>. When that's in a different module from
<__stddef_null.h>, redeclaration errors can occur.
Make the \_\_stddef_ headers be non-modular in
-fbuiltin-headers-in-system-modules and restore them back to not
respecting their header guards. Still define the header guards though.
__stddef_max_align_t.h was in _Builtin_stddef_max_align_t prior to the
addition of _Builtin_stddef, and it needs to stay in a module because
struct's can't be type merged. __stddef_wint_t.h didn't used to have a
module, but leave it in it current module since it doesn't really belong
to stddef.h.