This PR extracts the write to the in-memory module cache from within
`ASTWriter` into `CompilerInstance.` This brings it closer to other
module cache manipulations, making the ordering much more clear and
explicit.
Also deserialize them back again on reading.
The implementation is based on the existing implementation of `#pragma
weak` serialization.
Fixes issue #186742.
---------
Co-authored-by: Chuanqi Xu <yedeng.yd@linux.alibaba.com>
This PR removes the assumption that a deserialized module file is backed
by a `FileEntry`. The uniquing and lookup role of `ModuleFile`'s
`FileEntryRef` member is entirely replaced with the `ModuleFileKey`
member. For checking whether an existing `ModuleFile` conforms to the
expectations of importers, the file size and mod time are now stored
directly on `ModuleFile` (previously provided by its `FileEntry`).
Together, these changes enable removal of the
`ModuleManager::lookupByFileName(StringRef)` and
`ModuleManager::lookup(const FileEntry *)` APIs.
Previously we will apply all decl update unconditionally, but it might
not be best. See the attached example
'class-instantiate-no-change-02.cppm' for example. Sometimes these BMIs
are reducible.
Would land after 22 cut.
This PR changes how `ModuleManager` deduplicates module files.
Previously, `ModuleManager` used `FileEntry` for assigning unique
identity to module files. This works fine for explicitly-built modules
because they don't change during the lifetime of a single Clang
instance. For implicitly-built modules however, there are two issues:
1. The `FileEntry` objects are deduplicated by `FileManager` based on
the inode number. Some file systems reuse inode numbers of previously
removed files. Because implicitly-built module files are rapidly removed
and created, this deduplication breaks and compilations may fail
spuriously when inode numbers are recycled during the lifetime of a
single Clang instance.
2. The first thing `ModuleManager` does when loading a module file is
consulting the `FileManager` and checking the file size and modification
time match the expectation of the importer. This is done even when such
module file already lives in the `InMemoryModuleCache`. This introduces
racy behavior into the mechanism that explicitly tries to solve race
conditions, and may lead into spurious compilation failures.
This PR identifies implicitly-built module files by a pair of
`DirectoryEntry` of the module cache path and the path suffix
`<context-hash>/<module-name>-<module-map-path-hash>.pcm`. This gives us
canonicalization of the user-provided module cache path without turning
to `FileEntry` for the PCM file. The path suffix is Clang-generated and
is already canonical.
Some tests needed to be updated because the module cache path directory
was also used as an include directory. This PR relies on not caching the
non-existence of the module cache directory in the `FileManager`. When
other parts of Clang are trying to look up the same path and cache its
non-existence, things break. This is probably very specific to some of
our tests and not how users are setting up their compilations.
Update:
Close https://github.com/llvm/llvm-project/issues/184957
The roo cause of the problem is reduced BMI may not emit everything in
the lookup table, if Reduced BMI **partially** emits some decls, then
the generator may not emit the corresponding entry for the corresponding
name is already there. See
MultiOnDiskHashTableGenerator::insert and
MultiOnDiskHashTableGenerator::emit for details. So we won't emit the
lookup
table if we're generating reduced BMI.
Previously, the normalized module cache path was only accessible via
`HeaderSearch::getSpecificModuleCachePath()` which may or may not also
contain the context hash. Clients would need to parse the result to
learn the normalized module cache path. What `ASTWriter` does instead is
normalize the as-written module cache path redundantly.
Instead, this PR exposes the normalized module cache path in the
`HeaderSearch` interface and moves the computation of specific module
cache path into the clangLex library.
This is motivated by another patch that would've needed to redundantly
perform the module cache path canonicalization or parse the specific
module cache path.
See
https://clang.llvm.org/docs/StandardCPlusPlusModules.html#experimental-non-cascading-changes
for the full background.
In short, we want to cut off the dependencies to improve the
recompilation time.
And namespace is special, as the same namespace can occur in many TUs.
This patch tries to clean up unneeded reference to namespace from other
module file. The touched parts are:
- When write on disk lookup table, don't write the merged table. This
makes the ownership/dependency much cleaner. For performance, I think
the intention was to make the cost of mergin table amortized. But in our
internal workloads, I didn't observe any regression if I turn off
writing the merged table.
- When writing redeclarations, only write the ID of the first namespace
and avoid referenece to all other previous namespaces. (I don't want to
write the first namespace either... but it is more complex).
- When writing the name lookup table, try to write the local namespace.
@jtstogel @mpark I want to invite you to test this with your internal
workloads to figure out the correctness and the performance impact on
this.
I know I can make the change clean for you by inserting code guarded by
"if writing named module" but I think it will be much better if we can
make the underlying implementation more homogeneous if possible.
In https://reviews.llvm.org/D137214 and
https://reviews.llvm.org/D136624, offset adjustment logic was added to
account for the non-affecting module map files that are removed. While
the adjustment logic applies to global source location offsets, they do
not apply to file-internal offsets (relative within the file).
In `ASTWriter::WritePragmaDiagnosticMappings`, the adjustment is applied
to `StatePoint.Offset`s in `StateTransitions`. However, these offsets
are file-internal offsets, not global source location offsets. As such,
applying adjustment to these offsets result in incorrect diagnostic
behavior from the module.
Specifically, wrapping a piece of code in `pragma clang diagnostic
push/pop`, inside of a module is not applied correctly. A new test case
`diag-pragma-nonaffecting.cpp` was added to verify the broken behavior
as well as the corrected behavior with this commit.
Assisted-by: Claude Opus 4.6
This patch implements the CodeGen logic for calling __llvm_omp_indirect_call_lookup
on the device when an indirect function call or a virtual function call is made
within an OpenMP target region.
---------
Co-authored-by: Youngsuk Kim
This patch implements the CodeGen logic for calling __llvm_omp_indirect_call_lookup
on the device when an indirect function call or a virtual function call is made
within an OpenMP target region.
---------
Co-authored-by: Youngsuk Kim
Fold normalization of `.`'s into `FileManager::makeAbsolutePath` so that
different places that need to see through it can just call it instead of
handling it in wrappers.
There shouldn't be a functional impact.
Introduce `OverflowBehaviorType` (OBT), a new type attribute in Clang
that provides developers with fine-grained control over the overflow
behavior of integer types. This feature allows for a more nuanced
approach to integer safety, achieving better granularity than global
compiler flags like `-fwrapv` and `-ftrapv`. Type specifiers are also
available as keywords `__ob_wrap` and `__ob_trap`.
These can be applied to integer types (both signed and unsigned) as well
as typedef declarations, where the behavior is one of the following:
* `wrap`: Guarantees that arithmetic operations on the type will wrap on
overflow, similar to `-fwrapv`. This suppresses UBSan's integer overflow
checks for the attributed type and prevents eager compiler
optimizations.
* `trap`: Enforces overflow checking for the type, even when global
flags like `-fwrapv` would otherwise suppress it.
A key aspect of this feature is its interaction with existing
mechanisms. `OverflowBehaviorType` takes precedence over global flags
and, notably, over entries in the Sanitizer Special Case List (SSCL).
This allows developers to "allowlist" critical types for overflow
instrumentation, even if they are disabled by a broad rule in an SSCL.
Signed-off-by: Justin Stitt <justinstitt@google.com>
This patch adds logic to canonicalize `-include-pch`'s input in the
frontend. This way, the `ASTWriter` always serializes the canonicalized
path to the included pch file whether the input is an absolute path or a
relative path.
Fixes rdar://168596546.
Depends on #169603.
This is the `use_device_ptr` counterpart of #168905.
With OpenMP 6.1, a `fallback` modifier can be specified on the
`use_device_ptr` clause to control the behavior when a pointer lookup
fails, i.e. there is no device pointer to translate into.
The default is `fb_preserve` (i.e. retain the original pointer), while
`fb_nullify` means: use `nullptr` as the translated pointer.
Dependent PR: #173930.
This PR unifies the terminology for:
* "context hash" - previously ambiguously referred to as "module hash"
or as overly specific "module context hash"
* "specific module cache path" - previously referred to as just "module
cache path" - hard to distinguish from the command-line-provided module
cache path without the context hash
NFCI
This reverts commit 1928c1ea9b57e9c44325d436bc7bb2f4585031f3.
We have at least one repro, but I won't be able to work on this until
next week. Also with Clang 22 cut upcoming, we probably need to revert
for now.
RISC-V vector intrinsic is generated dynamically at runtime, thus it's
note preserved in AST yet when using precompile header, neither do
information in SemaRISCV. We need to write these information to ast
record to be able to use precompile header for RISC-V.
Fixes#109634
## Problem
Given code such as `N::foo();`, we perform name look-up on `N`. In the
case where `N` is a namespace declared in imported modules, one
namespace decl (the "key declaration") for each module that declares a
namespace `foo` is loaded and stored. In large scales where there are
many such modules, (e.g., 1,500) and many uses (e.g., 500,000), this
becomes extremely inefficient because every look-up (500,000 of them)
return 1,500 results.
The following synthetic script demonstrates the problem:
```bash
#/usr/bin/env bash
CLANG=${CLANG:-clang++}
NUM_MODULES=${NUM_MODULES:-1500}
NUM_USES=${NUM_USES:-500000}
USE_MODULES=${USE_MODULES:-true}
TMPDIR=$(mktemp -d)
echo "Working in temp directory: $TMPDIR"
cd $TMPDIR
trap "rm -rf \"$TMPDIR\"" EXIT
echo "namespace N { inline void foo() {} }" > m1.h
for i in $(seq 2 $NUM_MODULES); do echo "namespace N {}" > m${i}.h; done
if $USE_MODULES; then
seq 1 $NUM_MODULES | xargs -I {} -P $(nproc) bash -c "$CLANG -std=c++20 -fmodule-header m{}.h"
fi
> a.cpp
if $USE_MODULES; then
for i in $(seq 1 $NUM_MODULES); do echo "import \"m${i}.h\";" >> a.cpp; done
else
for i in $(seq 1 $NUM_MODULES); do echo "#include \"m${i}.h\"" >> a.cpp; done
fi
echo "int main() {" >> a.cpp
for i in $(seq 1 $NUM_USES); do echo " N::foo();" >> a.cpp; done
echo "}" >> a.cpp
if $USE_MODULES; then
time $CLANG -std=c++20 -Wno-experimental-header-units -c a.cpp -o /dev/null \
$(for i in $(seq 1 $NUM_MODULES); do echo -n "-fmodule-file=m${i}.pcm "; done)
else
time $CLANG -std=c++20 -Wno-experimental-header-units -c a.cpp -o /dev/null
fi
```
As of 575d6892bcc5cef926cfc1b95225148262c96a15, without modules
(`USE_MODULES=false`) this takes about **4.5s**, whereas with modules
(`USE_MODULES=true`), this takes about **37s**.
With this PR, without modules there's no change (as expected) at 4.5s,
but with modules it improves to about **5.2s**.
## Approach
The approach taken here aims to maintain status-quo with respect to the
input and output of modules. That is, the `ASTReader` and `ASTWriter`
both read and write the same declarations as it did before. The
difference is in the middle part: the [`StoredDeclsMap` in
`DeclContext`](https://github.com/llvm/llvm-project/blob/release/21.x/clang/include/clang/AST/DeclBase.h#L2024-L2030).
The `StoredDeclsMap` is roughly a `map<DeclarationName,
StoredDeclsList>`. Currently, we read all of the external namespace
decls from `ASTReader`, they all get stored into the `StoredDeclsList`,
and the `ASTWriter` iterates through that list and writes out the
results.
This PR continues to read all of the external namespace decls from
`ASTReader`, but only stores one namespace decl in the
`StoredDeclsList`. This is okay since the reading of the decls handles
all of the merging and chaining of the namespace decls, and as long as
they're loaded and chained, returning one for look-up purposes is
sufficient.
The other half of the problem is to write out all of the external
namespaces that we used to store in `StoredDeclsList` but no longer. For
this, we take advantage of the
[`KeyDecls`](https://github.com/llvm/llvm-project/blob/release/21.x/clang/include/clang/Serialization/ASTReader.h#L1342-L1347)
data structure in `ASTReader`. `KeyDecls` is roughly a `map<Decl *,
vector<GlobalDeclID>>`, and it stores a mapping from the canonical decl
of a redeclarable decl to a list of `GlobalDeclID`s where each ID
represents a "key declaration" from each imported module. More to the
point, if we read external namespaces `N1`, `N2`, `N3` in `ASTReader`,
we'll either have `N1` mapped to `[N2, N3]`, or some newly local
canonical decl mapped to `[N1, N2, N3]`. Either way, we can visit `N1`,
`N2`, and `N3` by doing `ASTReader::forEachImportedKeyDecls(N1,
Visitor)`, and we leverage this to maintain the current behavior of
writing out all of the imported namespace decls in `ASTWriter`.
## Alternatives Attempted
- Tried reading fewer declarations on the `ASTReader` side, and writing
out fewer declarations on the `ASTWriter` side, and neither options
worked at all.
- Tried trying to split `StoredDeclsList` into two pieces, one with
non-namespace decls and one with only namespace decls, but that didn't
work well... I think because the order of the declarations matter
sometimes, and maybe also because the declaration replacement logic gets
more complicated.
- Tried to deduplicate at the `SemaLookup` level. Basically, retrieve
all the stored decls but deduplicate populating the `LookupResult`
[here](https://github.com/llvm/llvm-project/blob/release/21.x/clang/lib/Sema/SemaLookup.cpp#L1137-L1144).
This did improve things slightly, but not quite enough, and this
solution seemed cleaner in the end anyway.
This reverts commit
54a4da9df6.
MSVC supports an extension allowing to delete an array of objects via
pointer whose static type doesn't match its dynamic type. This is done
via generation of special destructors - vector deleting destructors.
MSVC's virtual tables always contain a pointer to the vector deleting
destructor for classes with virtual destructors, so not having this
extension implemented causes clang to generate code that is not
compatible with the code generated by MSVC, because clang always puts a
pointer to a scalar deleting destructor to the vtable. As a bonus the
deletion of an array of polymorphic object will work just like it does
with MSVC - no memory leaks and correct destructors are called.
This patch will cause clang to emit code that is compatible with code
produced by MSVC but not compatible with code produced with clang of
older versions, so the new behavior can be disabled via passing
-fclang-abi-compat=21 (or lower).
Fixes https://github.com/llvm/llvm-project/issues/19772
- CUDA's dynamic parallelism extension allows device-side kernel
launches, which share the identical syntax to host-side launches, e.g.,
kernel<<<Dg, Db, Ns, S>>>(arguments);
but differ from the code generation. That device-side kernel launches is
eventually translated into the following sequence
config = cudaGetParameterBuffer(alignment, size);
// setup arguments by copying them into `config`.
cudaLaunchDevice(func, config, Dg, Db, Ns, S);
- To support the device-side kernel launch, 'CUDAKernelCallExpr' is
reused but its config expr is set to a call to 'cudaLaunchDevice'.
During the code generation, 'CUDAKernelCallExpr' is expanded into the
sequence aforementioned.
- As the device-side kernel launch requires the source to be compiled as
relocatable device code and linked with '-lcudadevrt'. Linkers are
changed to pass relevant link options to 'nvlink'.
As described in section 2.14.6 of openmp spec, the patch implements
support for iterator in motion clauses.
---------
Co-authored-by: Shashwathi N <nshashwa@pe31.hpc.amslabs.hpecorp.net>
MSVC supports an extension allowing to delete an array of objects via
pointer whose static type doesn't match its dynamic type. This is done
via generation of special destructors - vector deleting destructors.
MSVC's virtual tables always contain a pointer to the vector deleting
destructor for classes with virtual destructors, so not having this
extension implemented causes clang to generate code that is not
compatible with the code generated by MSVC, because clang always puts a
pointer to a scalar deleting destructor to the vtable. As a bonus the
deletion of an array of polymorphic object will work just like it does
with MSVC - no memory leaks and correct destructors are called.
This patch will cause clang to emit code that is compatible with code
produced by MSVC but not compatible with code produced with clang of
older versions, so the new behavior can be disabled via passing
-fclang-abi-compat=21 (or lower).
This is yet another attempt to land vector deleting destructors support
originally implemented by
https://github.com/llvm/llvm-project/pull/133451.
This PR contains fixes for issues reported in the original PR as well as
fixes for issues related to operator delete[] search reported in several
issues like
https://github.com/llvm/llvm-project/pull/133950#issuecomment-2787510484https://github.com/llvm/llvm-project/issues/134265
Fixes https://github.com/llvm/llvm-project/issues/19772
Similar to previous no transitive changes to decls, types, identifiers
and source locations (
https://github.com/llvm/llvm-project/pull/92083https://github.com/llvm/llvm-project/pull/92085https://github.com/llvm/llvm-project/pull/92511https://github.com/llvm/llvm-project/pull/86912
)
This patch does the same thing for MacroID and PreprocessedEntityID.
---
### Some background
Previously we record different IDs linearly. That is, when writing a
module, if we have 17 decls in imported modules, the ID of decls in the
module will start from 18. This makes the contents of the BMI changes if
the we add/remove any decls, types, identifiers and source locations in
the imported modules.
This makes it hard for us to reduce recompilations with modules. We want
to skip recompilations as we think the modules can help us to remove
fake dependencies. This can be done by split the ID into <ModuleIndex,
LocalIndex> pairs.
This is ALREADY done for several different ID above. We call it
non-casacading changes
(https://clang.llvm.org/docs/StandardCPlusPlusModules.html#experimental-non-cascading-changes).
Our internal users have already used this feature and it works well for
years.
Now we want to extend this to MacroID and PreprocessedEntityID. This is
helpful for us in the downstream as we allowed named modules to export
macros. But I believe this is also helpful for header-like modules if
you'd like to explore the area.
And also I think this is a nice cleanup too.
---
Given the use of MacroID and PreprocessedEntityID are not as complicated
as other IDs in the above series, I feel the patch itself should be
good. I hope the vendors can test the patch to make sure it won't affect
existing users.
Close https://github.com/llvm/llvm-project/issues/166068
The cause of the problem is that we would import initializers and
pending implicit instantiations from other named module. This is very
bad and it may waste a lot of time.
And we didn't observe it as the weak symbols can live together and the
strong symbols would be removed by other mechanism. So we didn't observe
the bad behavior for a long time. But it indeeds waste compilation time.
This PR adds support for the `dyn_groupprivate` clause, which will be
part of OpenMP 6.1. This feature allows users to request dynamic shared
memory on target regions.
---------
Co-authored-by: Krzysztof Parzyszek <Krzysztof.Parzyszek@amd.com>
Fixes#165445.
Fixes a crash when `ASTWriter::GenerateNameLookupTable` processes enum
constants from C++20 header units.
The special handling for enum constants, introduced in fccc6ee, doesn't
account for declarations whose owning module is a C++20 header unit. It
calls `isNamedModule()` on the result of
`getTopLevelOwningNamedModule()`, which returns null for header units,
causing a null pointer dereference.
This PR enhances the OpenMP `nowait` clause implementation by adding
support for optional argument in both parsing and semantic analysis
phases.
Reference:
1. OpenMP 6.0 Specification, page 481
This is the first patch of a handful to get the reduction combiner
recipe lowering properly. THIS patch is NFC as it doesn't actually
change anything except the structure of the AST.
For each 'combiner' recipe we need a 'LHS' 'RHS' and expression to
represent the operation.
Each var-reference can have 1 or more combiners.
IF it is a plain scalar, or a struct with the proper operator, or an
array of either of those, there will be 1.
HOWEVER, aggregates without the proper operator are supposed to be
broken down and done from their elements (which can only be scalars). In
this case, we will represent 1 'combiner' recipe per field-decl.
This patch only puts the infrastructure in place to do so, future
patches wll do the work to fill this in.
I originally expected that we were going to need the initExpr stored
separately from the allocaDecl when doing arrays/pointers, however after
implementing it, we found that the idea of having the allocaDecl just
store its init directly still works perfectly. This patch removes the
extra field from the AST.
This change implements the fuse directive, `#pragma omp fuse`, as specified in the OpenMP 6.0, along with the `looprange` clause in clang.
This change also adds minimal stubs so flang keeps compiling (a full implementation in flang of this directive is still pending).
---------
Co-authored-by: Roger Ferrer Ibanez <roger.ferrer@bsc.es>
Close https://github.com/llvm/llvm-project/issues/159424
Close https://github.com/llvm/llvm-project/issues/133720
For in-class friend declaration, it is hard for the serializer to decide
if they are visible to other modules. But luckily, Sema can handle it
perfectly enough. So it is fine to make all of the in-class friend
declaration as generally visible in ASTWriter and let the Sema to make
the final call. This is safe as long as the corresponding class's
visibility are correct.
While working on vector deleting destructors support
([GH19772](https://github.com/llvm/llvm-project/issues/19772)), I
noticed that MSVC produces different code in scalar deleting destructor
body depending on whether class defined its own operator delete. In
MSABI deleting destructors accept an additional implicit flag parameter
allowing some sort of flexibility. The mismatch I noticed is that
whenever a global operator delete is called, i.e. `::delete`, in the
code produced by MSVC the implicit flag argument has a value that makes
the 3rd bit set, i.e. "5" for scalar deleting destructors "7" for vector
deleting destructors.
Prior to this patch, clang handled `::delete` via calling global
operator delete direct after the destructor call and not calling class
operator delete in scalar deleting destructor body by passing "0" as
implicit flag argument value. This is fine until there is an attempt to
link binaries compiled with clang with binaries compiled with MSVC. The
problem is that in binaries produced by MSVC the callsite of the
destructor won't call global operator delete because it is assumed that
the destructor will do that and a destructor body generated by clang
will never do.
This patch removes call to global operator delete from the callsite and
add additional check of the 3rd bit of the implicit parameter inside of
scalar deleting destructor body.
---------
Co-authored-by: Tom Honermann <tom@honermann.net>
A DependentTemplateSpecializationType (DTST) is basically just a
TemplateSpecializationType (TST) with a hardcoded DependentTemplateName
(DTN) as its TemplateName.
This removes the DTST and replaces all uses of it with a TST, removing a
lot of duplication in the implementation.
Technically the hardcoded DTN is an optimization for a most common case,
but the TST implementation is in better shape overall and with other
optimizations, so this patch ends up being an overall performance
positive:
<img width="1465" height="38" alt="image"
src="https://github.com/user-attachments/assets/084b0694-2839-427a-b664-eff400f780b5"
/>
A DTST also didn't allow a template name representing a DTN that was
substituted, such as from an alias template, while the TST does allow it
by the simple fact it can hold an arbitrary TemplateName, so this patch
also increases the amount of sugar retained, while still being faster
overall.
Example (from included test case):
```C++
template<template<class> class TT> using T1 = TT<int>;
template<class T> using T2 = T1<T::template X>;
```
Here we can now represent in the AST that `TT` was substituted for the
dependent template name `T::template X`.
This reverts commit 613caa909c78f707e88960723c6a98364656a926, essentially
reapplying 4a4bddec3571d78c8073fa45b57bbabc8796d13d after moving
`normalizeModuleCachePath` from clangFrontend to clangLex.
This PR is part of an effort to remove file system usage from the
command line parsing code. The reason for that is that it's impossible
to do file system access correctly without a configured VFS, and the VFS
can only be configured after the command line is parsed. I don't want to
intertwine command line parsing and VFS configuration, so I decided to
perform the file system access after the command line is parsed and the
VFS is configured - ideally right before the file system entity is used
for the first time.
This patch delays normalization of the module cache path until
`CompilerInstance` is asked for the cache path in the current
compilation context.
This reverts commit 4a4bddec3571d78c8073fa45b57bbabc8796d13d. The Serialization library doesn't link Frontend, where CompilerInstance lives, causing link failures on some build bots.
This PR is part of an effort to remove file system usage from the
command line parsing code. The reason for that is that it's impossible
to do file system access correctly without a configured VFS, and the VFS
can only be configured after the command line is parsed. I don't want to
intertwine command line parsing and VFS configuration, so I decided to
perform the file system access after the command line is parsed and the
VFS is configured - ideally right before the file system entity is used
for the first time.
This patch delays normalization of the module cache path until
`CompilerInstance` is asked for the cache path in the current
compilation context.