The arguments to this are the same as for the 'wait' clause, so this
reuses all of that infrastructure. So all this has to do is support a
pair of clauses that are already implemented (if and async), plus create
an AST node. This patch does so, and adds proper testing.
This is a clause that is only valid on 'host_data' constructs, and
identifies variables which it should use the current device address.
From a Sema perspective, the only thing novel here is mild changes to
how ActOnVar works for this clause, else this is very much like the rest
of the 'var-list' clauses.
'delete' is another clause that has very little compile-time
implication, but needs a full AST that takes a var list. This patch
ipmlements it fully, plus adds sufficient test coverage.
This is another new clause specific to 'exit data' that takes a pointer
argument. This patch implements this the same way we do a few other
clauses (like attach) that have the same restrictions.
The 'if_present' clause controls the replacement of addresses in the
var-list in current device memory. This clause can only go on
'host_device'. From a Sema perspective, there isn't anything to do
beyond add this to AST and pass it on.
This is a very simple clause as far as sema is concerned. It is only
valid on 'exit data', and doesn't have any rules involving it, so it is
simply applied and passed onto the MLIR.
These constructs are all very similar and closely related, so this patch
creates the AST nodes for them, serialization, printing/etc.
Additionally the restrictions are all added as tests/todos in the tests,
as those will have to be implemented once we get those clauses implemented.
Reland https://github.com/llvm/llvm-project/pull/83237
---
(Original comments)
Currently all the specializations of a template (including
instantiation, specialization and partial specializations) will be
loaded at once if we want to instantiate another instance for the
template, or find instantiation for the template, or just want to
complete the redecl chain.
This means basically we need to load every specializations for the
template once the template declaration got loaded. This is bad since
when we load a specialization, we need to load all of its template
arguments. Then we have to deserialize a lot of unnecessary
declarations.
For example,
```
// M.cppm
export module M;
export template <class T>
class A {};
export class ShouldNotBeLoaded {};
export class Temp {
A<ShouldNotBeLoaded> AS;
};
// use.cpp
import M;
A<int> a;
```
We have a specialization ` A<ShouldNotBeLoaded>` in `M.cppm` and we
instantiate the template `A` in `use.cpp`. Then we will deserialize
`ShouldNotBeLoaded` surprisingly when compiling `use.cpp`. And this
patch tries to avoid that.
Given that the templates are heavily used in C++, this is a pain point
for the performance.
This patch adds MultiOnDiskHashTable for specializations in the
ASTReader. Then we will only deserialize the specializations with the
same template arguments. We made that by using ODRHash for the template
arguments as the key of the hash table.
To review this patch, I think `ASTReaderDecl::AddLazySpecializations`
may be a good entry point.
Note that PointerUnion::{is,get} have been soft deprecated in
PointerUnion.h:
// FIXME: Replace the uses of is(), get() and dyn_cast() with
// isa<T>, cast<T> and the llvm::dyn_cast<T>
I'm not touching PointerUnion::dyn_cast for now because it's a bit
complicated; we could blindly migrate it to dyn_cast_if_present, but
we should probably use dyn_cast when the operand is known to be
non-null.
Currently all the specializations of a template (including
instantiation, specialization and partial specializations) will be
loaded at once if we want to instantiate another instance for the
template, or find instantiation for the template, or just want to
complete the redecl chain.
This means basically we need to load every specializations for the
template once the template declaration got loaded. This is bad since
when we load a specialization, we need to load all of its template
arguments. Then we have to deserialize a lot of unnecessary
declarations.
For example,
```
// M.cppm
export module M;
export template <class T>
class A {};
export class ShouldNotBeLoaded {};
export class Temp {
A<ShouldNotBeLoaded> AS;
};
// use.cpp
import M;
A<int> a;
```
We should a specialization ` A<ShouldNotBeLoaded>` in `M.cppm` and we
instantiate the template `A` in `use.cpp`. Then we will deserialize
`ShouldNotBeLoaded` surprisingly when compiling `use.cpp`. And this
patch tries to avoid that.
Given that the templates are heavily used in C++, this is a pain point
for the performance.
This patch adds MultiOnDiskHashTable for specializations in the
ASTReader. Then we will only deserialize the specializations with the
same template arguments. We made that by using ODRHash for the template
arguments as the key of the hash table.
To review this patch, I think `ASTReaderDecl::AddLazySpecializations`
may be a good entry point.
The patch was reviewed in
https://github.com/llvm/llvm-project/pull/83237 but that PR is a stacked
PR. But I feel the intention of the stacked PRs get lost during the
review process. So I feel it is better to merge the commits into a
single commit instead of merging them in the PR page. It is better for
us to cherry-pick and revert.
This is a follow up to #112015 and it reduces the unnecessary
duplication of source locations further.
We do not need to allocate source location space in the serialized PCMs
for module maps used only to find textual headers. Those module maps are
never referenced from anywhere in the serialized ASTs and are re-read in
other compilations.
This change should not affect correctness of Clang compilations or
clang-scan-deps in any way.
We do need the InputFile entry in the serialized AST because
clang-scan-deps relies on it. The previous patch introduced a mechanism
to do exactly that.
We have found that to finally remove any duplication of module maps we
use internally in our build system.
There were many many "voices" about the too strict flags checking in
modules. Although they rarely challenge this, maybe due to they respect
to the compiler implementation details. But from my point of view, there
are cases it is "fine" to have different flags. Especially we're too
conservative to mark almost language options in
`clang/include/clang/Basic/LangOptions.def` as incompatible options (see
the comments in the front of the file).
In my understanding, this should come from PCH initially since it is
natural to ask your headers to be compiled with the same flags with your
TU. And then, when Apple and Google goes to implement clang module, they
don't challenge it too since they have a closed world where they have a
strong control over the ecosystem so that they can make it consistent.
Yes, consistency is great and ODR violation are awful. But this is the
world we're living today. This is the C++'s ecosystem in the open ended
world. Image a situation that we're using a third party module and we
add a new option to our library, then the build bails out! THIS IS SUPER
ANNOYING. And makes it non practical to make a modular C++ ecosystem.
(
This was discussed many times in SG15. And the consensus is, the build
systems should generate different BMI based on different flags. But this
manner can't avoid ODR violation completely and it would add the times
of module files that need to be built, which may kill the benefit of
faster compilation of modules.
However, I think the build systems may need to do the similar things in
the end of the day. Considering libc++'s hardening mechanism
(https://libcxx.llvm.org/Hardening.html). So the conclusion of the
paragraph is, although this seems related to build systems, I think they
are actually unrelated story.
)
I think we should give our users a chance to disable such checks. It is
theoretically unsafe. But we've done our job to tell the users that it
**MAY** be bad. Then I feel it is C++-ish to give users more freedom
even if they may shoot their foot.
This shouldn't change any thing. Users who want previous behavior can
get it easily by `-Werror=`.
Substituting into pack indexing types/expressions can still result in
unexpanded types/expressions, such as `PackIndexingType` or
`PackIndexingExpr`. To handle these cases correctly, we should defer the
pack size checks to the next round of transformation, when the patterns
can be fully expanded.
To that end, the `FullySubstituted` flag is now necessary for computing
the dependencies of `PackIndexingExprs`. Conveniently, this flag can
also represent the prior `ExpandsToEmpty` status with an additional
emptiness check. Therefore, I converted all stored flags to use
`FullySubstituted`.
Fixes https://github.com/llvm/llvm-project/issues/116105
This PR changes a part of the PCM format to store string-like things in
the blob attached to a record instead of VBR6-encoding them into the
record itself. Applied to the `IMPORTS` section (which is very hot),
this speeds up dependency scanning by 2.8%.
Structural equivalence check uses a cache to store already found
non-equivalent values. This cache can be reused for calls (ASTImporter
does this). Value of "IgnoreTemplateParmDepth" can have an effect on the
structural equivalence therefore it is wrong to reuse the same cache for
checks with different values of 'IgnoreTemplateParmDepth'. The current
change adds the 'IgnoreTemplateParmDepth' to the cache key to fix the
problem.
Combined constructs (OpenACC 3.3 section 2.11) are a short-cut for
writing a `loop` construct immediately inside of a `compute` construct.
However, this interaction requires we do additional work to ensure that
we get the semantics between the two correct, as well as diagnostics.
This patch adds the semantic analysis for the constructs (but no
clauses), as well as the AST nodes.
This implements
https://discourse.llvm.org/t/rfc-add-support-for-controlling-diagnostics-severities-at-file-level-granularity-through-command-line/81292.
Users now can suppress warnings for certain headers by providing a
mapping with globs, a sample file looks like:
```
[unused]
src:*
src:*clang/*=emit
```
This will suppress warnings from `-Wunused` group in all files that
aren't under `clang/` directory. This mapping file can be passed to
clang via `--warning-suppression-mappings=foo.txt`.
At a high level, mapping file is stored in DiagnosticOptions and then
processed with rest of the warning flags when creating a
DiagnosticsEngine. This is a functor that uses SpecialCaseLists
underneath to match against globs coming from the mappings file.
This implies processing warning options now performs IO, relevant
interfaces are updated to take in a VFS, falling back to RealFileSystem
when one is not available.
This PR builds on top of
https://github.com/llvm/llvm-project/pull/115235 and makes it possible
to call `ASTWriter::WriteAST()` with `Preprocessor` only instead of full
`Sema` object. So far, there are no clients that leverage the new
capability - that will come in a follow-up commit.
This PR builds on top of #113984 and attempts to avoid allocating input
file paths eagerly. Instead, the `InputFileInfo` type used by
`ASTReader` now only holds `StringRef`s that point into the PCM file
buffer, and the full input file paths get resolved on demand.
The dependency scanner makes use of this in a bit of a roundabout way:
`ModuleDeps` now only holds (an owning copy of) the short unresolved
input file paths, which get resolved lazily. This can be a big win, I'm
seeing up to a 5% speedup.
After implementing 'loop', we determined that the link to its parent
only ever uses the type, not the construct itself. This patch removes
it, as it is both a waste and causes problems with serialization.
Currently, any FileID that references a module map file that was
required for a compilation is considered as affecting. This misses an
important opportunity to reduce the source location space taken by the
resulting PCM.
In particular, consider the situation where the same module map file is
passed multiple times in the dependency chain:
```shell
$ clang -fmodule-map-file=foo.modulemap ... -o mod1.pcm
$ clang -fmodule-map-file=foo.modulemap -fmodule-file=mod1.pcm ... -o mod2.pcm
...
$ clang -fmodule-map-file=foo.modulemap -fmodule-file=mod$((N-1)).pcm ... -o mod$N.pcm
```
Because `foo.modulemap` is read before reading any of the `.pcm` files,
we have to create a unique `FileID` for it when creating each module.
However, when reading the `.pcm` files, we will reuse the `FileID`
loaded from it for the same module map file and the `FileID` we created
can never be used again, but we will still mark it as affecting and it
will take the source location space in the output PCM.
For a chain of N dependencies, this results in the file taking `N *
(size of file)` source location space, which could be significant. For
examples, we observer internally that some targets that run out of 2GB
of source location space end up wasting up to 20% of that space in
module maps as described above.
I take extra care to still write the InputFile entries for those files that occupied
source location space before. It is required for correctness of clang-scan-deps.
This patch removes `ASTWriter::Context` and starts passing `ASTContext
&` explicitly to functions that actually need it. This is a
non-functional change with the end-goal of being able to write
lightweight PCM files with no `ASTContext` at all.
We made the incorrect assumption that names of fields are unique when
creating their default initializers.
We fix that by keeping track of the instantiaation pattern for field
decls that are placeholder vars,
like we already do for unamed fields.
Fixes#114069
The 'allocator' modifier is now accepted in the 'allocate' clause. Added
LIT tests covering codegen, PCH, template handling, and serialization
for 'allocator' modifier.
Added support for allocator-modifier to release notes.
Testing
- New allocate modifier LIT tests.
- OpenMP LIT tests.
- check-all
- relevant sollve_vv test cases
tests/5.2/scope/test_scope_allocate_construct.c
The `sycl_kernel_entry_point` attribute is used to declare a function that
defines a pattern for an offload kernel to be emitted. The attribute requires
a single type argument that specifies the type used as a SYCL kernel name as
described in section 5.2, "Naming of kernels", of the SYCL 2020 specification.
Properties of the offload kernel are collected when a function declared with
the `sycl_kernel_entry_point` attribute is parsed or instantiated. These
properties, such as the kernel name type, are stored in the AST context where
they are (or will be) used for diagnostic purposes and to facilitate reflection
to a SYCL run-time library. These properties are not serialized with the AST
but are recreated upon deserialization.
The `sycl_kernel_entry_point` attribute is intended to replace the existing
`sycl_kernel` attribute which is intended to be deprecated in a future change
and removed following an appropriate deprecation period. The new attribute
differs in that it is enabled for both SYCL host and device compilation, may
be used with non-template functions, explicitly indicates the type used as
the kernel name type, and will impact AST generation.
This change adds the basic infrastructure for the new attribute. Future
changes will add diagnostics and new AST support that will be used to drive
generation of the corresponding offload kernel.
This PR removes the `HeaderFileInfo::Framework` member and reduces the
size of this data type from 32B to 16B. This should improve Clang's
memory usage in situations where it keeps track of lots of header files.
NFCI. Depends on #114459.
This PR removes the `-index-header-map` functionality from Clang. AFAIK
this was only used internally at Apple and is now dead code. The main
motivation behind this change is to enable the removal of
`HeaderFileInfo::Framework` member and reducing the size of that data
structure.
rdar://84036149
When reading a path from a bitstream blob, `ASTReader` performs up to
three allocations:
1. Conversion of the `StringRef` blob into `std::string` to conform to
the `ResolveImportedPath()` API that takes `std::string &`.
2. Concatenation of the module file prefix directory and the relative
path into a fresh `SmallString<128>` buffer in `ResolveImportedPath()`.
3. Propagating the result out of `ResolveImportedPath()` by calling
`std::string::assign()` on the out-parameter.
This patch makes is so that we avoid allocations altogether (amortized)
by:
1. Avoiding conversion of the `StringRef` blob into `std::string` and
changing the `ResolveImportedPath()` API.
2. Using one "global" buffer to hold the concatenation.
3. Returning `StringRef` that points into the buffer and ensuring the
contents are not overwritten while it lives.
Note that in some places of the bitstream we don't store paths as blobs,
but rather as records that get VBR-encoded. This makes the allocation in
(1) unavoidable. I plan to fix this in a follow-up PR by changing the
PCM format.
Moreover, there are some data structures (e.g.
`serialization::InputFileInfo`) that store deserialized and resolved
paths as `std::string`. If we don't access them frequently, it would be
more efficient to store just the unresolved `StringRef` and resolve them
on demand (within some kind of shared buffer to prevent allocations).
This PR alone improves `clang-scan-deps` performance on my workload by
3.6%.
Instead of changing the return type of `ModuleMap::findOrCreateModule`, this patch adds a counterpart that only returns `Module *` and thus has the same signature as `createModule()`, which is important in `ASTReader`.
This patch avoids eagerly populating the submodule index on `Module`
construction. The `StringMap` allocation shows up in my profiles of
`clang-scan-deps`, while the index is not necessary most of the time. We
still construct it on-demand.
Moreover, this patch avoids performing qualified submodule lookup in
`ASTReader` whenever we're serializing a module graph whose top-level
module is unknown. This is pointless, since that's guaranteed to never
find any existing submodules anyway.
This speeds up `clang-scan-deps` by ~0.5% on my workload.
With inferred modules, the dependency scanner takes care to replace the
fake "__inferred_module.map" path with the file that allowed the module
to be inferred. However, this only worked when such a module was
imported directly in the TU. Whenever such module got loaded
transitively, the scanner would fail to perform the replacement. This is
caused by the fact that PCM files are lossy and drop this information.
This patch makes sure that PCMs include this file for each submodule (in
the `SUBMODULE_DEFINITION` record), fixes one existing test with an
incorrect assertion, and does a little drive-by refactoring of
`ModuleMap`.
I noticed that some PCM files contain `HeaderFileInfo` for headers only
included in a dependent PCM file, which is wasteful.
This patch changes the logic to only write headers that are included
locally. This makes the PCM files smaller and saves some superfluous
deserialization of `HeaderFileInfo` triggered by
`Preprocessor::alreadyIncluded()`.
This patch shrinks the size of the `Module` class from 2112B to 1624B. I
wasn't able to get a good data on the actual impact on memory usage, but
given my `clang-scan-deps` workload at hand (with tens of thousands of
instances), I think there should be some win here. This also speeds up
my benchmark by under 0.1%.
Clang uses timestamp files to track the last time an implicitly-built
PCM file was verified to be up-to-date with regard to its inputs. With
`-fbuild-session-{file,timestamp}=` and
`-fmodules-validate-once-per-build-session` this reduces the number of
times a PCM file is checked per "build session".
The behavior I'm seeing with the current scheme is that when lots of
Clang instances wait for the same PCM to be built, they race to validate
it as soon as the file lock gets released, causing lots of concurrent
IO.
This patch makes it so that the timestamp is written by the same Clang
instance responsible for building the PCM while still holding the lock.
This makes it so that whenever a PCM file gets compiled, it's never
re-validated in the same build session.
I believe this is as sound as the current scheme. One thing to be aware
of is that there might be a time interval between accessing input file N
and writing the timestamp file, where changes to input files 0..<N would
not result in a rebuild. Since this is the case current scheme too, I'm
not too concerned about that.
I've seen this speed up `clang-scan-deps` by ~27%.
This patch adds an IsText parameter to the following getBufferForFile,
getBufferForFileImpl. We introduce a new virtual function
openFileForReadBinary which defaults to openFileForRead except in
RealFileSystem which uses the OF_None flag instead of OF_Text.
The default is set to OF_Text instead of OF_None, this change in value
does not affect any other platforms other than z/OS. Setting this
parameter correctly is required to open files on z/OS in the correct
encoding. The IsText parameter is based on the context of where we open
files, for example, in the ASTReader, HeaderMap requires that files
always be opened in binary even though they might be tagged as text.
The 'vector' clause specifies the iterations to be executed in vector or
SIMD mode. There are some limitations on which associated compute
contexts may be associated with this and have arguments, but otherwise
this is a fairly unrestricted clause.
It DOES have region limits like 'gang' and 'worker'.