Structural equivalence check uses a cache to store already found
non-equivalent values. This cache can be reused for calls (ASTImporter
does this). Value of "IgnoreTemplateParmDepth" can have an effect on the
structural equivalence therefore it is wrong to reuse the same cache for
checks with different values of 'IgnoreTemplateParmDepth'. The current
change adds the 'IgnoreTemplateParmDepth' to the cache key to fix the
problem.
Combined constructs (OpenACC 3.3 section 2.11) are a short-cut for
writing a `loop` construct immediately inside of a `compute` construct.
However, this interaction requires we do additional work to ensure that
we get the semantics between the two correct, as well as diagnostics.
This patch adds the semantic analysis for the constructs (but no
clauses), as well as the AST nodes.
This implements
https://discourse.llvm.org/t/rfc-add-support-for-controlling-diagnostics-severities-at-file-level-granularity-through-command-line/81292.
Users now can suppress warnings for certain headers by providing a
mapping with globs, a sample file looks like:
```
[unused]
src:*
src:*clang/*=emit
```
This will suppress warnings from `-Wunused` group in all files that
aren't under `clang/` directory. This mapping file can be passed to
clang via `--warning-suppression-mappings=foo.txt`.
At a high level, mapping file is stored in DiagnosticOptions and then
processed with rest of the warning flags when creating a
DiagnosticsEngine. This is a functor that uses SpecialCaseLists
underneath to match against globs coming from the mappings file.
This implies processing warning options now performs IO, relevant
interfaces are updated to take in a VFS, falling back to RealFileSystem
when one is not available.
This PR builds on top of
https://github.com/llvm/llvm-project/pull/115235 and makes it possible
to call `ASTWriter::WriteAST()` with `Preprocessor` only instead of full
`Sema` object. So far, there are no clients that leverage the new
capability - that will come in a follow-up commit.
This PR builds on top of #113984 and attempts to avoid allocating input
file paths eagerly. Instead, the `InputFileInfo` type used by
`ASTReader` now only holds `StringRef`s that point into the PCM file
buffer, and the full input file paths get resolved on demand.
The dependency scanner makes use of this in a bit of a roundabout way:
`ModuleDeps` now only holds (an owning copy of) the short unresolved
input file paths, which get resolved lazily. This can be a big win, I'm
seeing up to a 5% speedup.
After implementing 'loop', we determined that the link to its parent
only ever uses the type, not the construct itself. This patch removes
it, as it is both a waste and causes problems with serialization.
Currently, any FileID that references a module map file that was
required for a compilation is considered as affecting. This misses an
important opportunity to reduce the source location space taken by the
resulting PCM.
In particular, consider the situation where the same module map file is
passed multiple times in the dependency chain:
```shell
$ clang -fmodule-map-file=foo.modulemap ... -o mod1.pcm
$ clang -fmodule-map-file=foo.modulemap -fmodule-file=mod1.pcm ... -o mod2.pcm
...
$ clang -fmodule-map-file=foo.modulemap -fmodule-file=mod$((N-1)).pcm ... -o mod$N.pcm
```
Because `foo.modulemap` is read before reading any of the `.pcm` files,
we have to create a unique `FileID` for it when creating each module.
However, when reading the `.pcm` files, we will reuse the `FileID`
loaded from it for the same module map file and the `FileID` we created
can never be used again, but we will still mark it as affecting and it
will take the source location space in the output PCM.
For a chain of N dependencies, this results in the file taking `N *
(size of file)` source location space, which could be significant. For
examples, we observer internally that some targets that run out of 2GB
of source location space end up wasting up to 20% of that space in
module maps as described above.
I take extra care to still write the InputFile entries for those files that occupied
source location space before. It is required for correctness of clang-scan-deps.
This patch removes `ASTWriter::Context` and starts passing `ASTContext
&` explicitly to functions that actually need it. This is a
non-functional change with the end-goal of being able to write
lightweight PCM files with no `ASTContext` at all.
We made the incorrect assumption that names of fields are unique when
creating their default initializers.
We fix that by keeping track of the instantiaation pattern for field
decls that are placeholder vars,
like we already do for unamed fields.
Fixes#114069
The 'allocator' modifier is now accepted in the 'allocate' clause. Added
LIT tests covering codegen, PCH, template handling, and serialization
for 'allocator' modifier.
Added support for allocator-modifier to release notes.
Testing
- New allocate modifier LIT tests.
- OpenMP LIT tests.
- check-all
- relevant sollve_vv test cases
tests/5.2/scope/test_scope_allocate_construct.c
The `sycl_kernel_entry_point` attribute is used to declare a function that
defines a pattern for an offload kernel to be emitted. The attribute requires
a single type argument that specifies the type used as a SYCL kernel name as
described in section 5.2, "Naming of kernels", of the SYCL 2020 specification.
Properties of the offload kernel are collected when a function declared with
the `sycl_kernel_entry_point` attribute is parsed or instantiated. These
properties, such as the kernel name type, are stored in the AST context where
they are (or will be) used for diagnostic purposes and to facilitate reflection
to a SYCL run-time library. These properties are not serialized with the AST
but are recreated upon deserialization.
The `sycl_kernel_entry_point` attribute is intended to replace the existing
`sycl_kernel` attribute which is intended to be deprecated in a future change
and removed following an appropriate deprecation period. The new attribute
differs in that it is enabled for both SYCL host and device compilation, may
be used with non-template functions, explicitly indicates the type used as
the kernel name type, and will impact AST generation.
This change adds the basic infrastructure for the new attribute. Future
changes will add diagnostics and new AST support that will be used to drive
generation of the corresponding offload kernel.
This PR removes the `HeaderFileInfo::Framework` member and reduces the
size of this data type from 32B to 16B. This should improve Clang's
memory usage in situations where it keeps track of lots of header files.
NFCI. Depends on #114459.
This PR removes the `-index-header-map` functionality from Clang. AFAIK
this was only used internally at Apple and is now dead code. The main
motivation behind this change is to enable the removal of
`HeaderFileInfo::Framework` member and reducing the size of that data
structure.
rdar://84036149
When reading a path from a bitstream blob, `ASTReader` performs up to
three allocations:
1. Conversion of the `StringRef` blob into `std::string` to conform to
the `ResolveImportedPath()` API that takes `std::string &`.
2. Concatenation of the module file prefix directory and the relative
path into a fresh `SmallString<128>` buffer in `ResolveImportedPath()`.
3. Propagating the result out of `ResolveImportedPath()` by calling
`std::string::assign()` on the out-parameter.
This patch makes is so that we avoid allocations altogether (amortized)
by:
1. Avoiding conversion of the `StringRef` blob into `std::string` and
changing the `ResolveImportedPath()` API.
2. Using one "global" buffer to hold the concatenation.
3. Returning `StringRef` that points into the buffer and ensuring the
contents are not overwritten while it lives.
Note that in some places of the bitstream we don't store paths as blobs,
but rather as records that get VBR-encoded. This makes the allocation in
(1) unavoidable. I plan to fix this in a follow-up PR by changing the
PCM format.
Moreover, there are some data structures (e.g.
`serialization::InputFileInfo`) that store deserialized and resolved
paths as `std::string`. If we don't access them frequently, it would be
more efficient to store just the unresolved `StringRef` and resolve them
on demand (within some kind of shared buffer to prevent allocations).
This PR alone improves `clang-scan-deps` performance on my workload by
3.6%.
Instead of changing the return type of `ModuleMap::findOrCreateModule`, this patch adds a counterpart that only returns `Module *` and thus has the same signature as `createModule()`, which is important in `ASTReader`.
This patch avoids eagerly populating the submodule index on `Module`
construction. The `StringMap` allocation shows up in my profiles of
`clang-scan-deps`, while the index is not necessary most of the time. We
still construct it on-demand.
Moreover, this patch avoids performing qualified submodule lookup in
`ASTReader` whenever we're serializing a module graph whose top-level
module is unknown. This is pointless, since that's guaranteed to never
find any existing submodules anyway.
This speeds up `clang-scan-deps` by ~0.5% on my workload.
With inferred modules, the dependency scanner takes care to replace the
fake "__inferred_module.map" path with the file that allowed the module
to be inferred. However, this only worked when such a module was
imported directly in the TU. Whenever such module got loaded
transitively, the scanner would fail to perform the replacement. This is
caused by the fact that PCM files are lossy and drop this information.
This patch makes sure that PCMs include this file for each submodule (in
the `SUBMODULE_DEFINITION` record), fixes one existing test with an
incorrect assertion, and does a little drive-by refactoring of
`ModuleMap`.
I noticed that some PCM files contain `HeaderFileInfo` for headers only
included in a dependent PCM file, which is wasteful.
This patch changes the logic to only write headers that are included
locally. This makes the PCM files smaller and saves some superfluous
deserialization of `HeaderFileInfo` triggered by
`Preprocessor::alreadyIncluded()`.
This patch shrinks the size of the `Module` class from 2112B to 1624B. I
wasn't able to get a good data on the actual impact on memory usage, but
given my `clang-scan-deps` workload at hand (with tens of thousands of
instances), I think there should be some win here. This also speeds up
my benchmark by under 0.1%.
Clang uses timestamp files to track the last time an implicitly-built
PCM file was verified to be up-to-date with regard to its inputs. With
`-fbuild-session-{file,timestamp}=` and
`-fmodules-validate-once-per-build-session` this reduces the number of
times a PCM file is checked per "build session".
The behavior I'm seeing with the current scheme is that when lots of
Clang instances wait for the same PCM to be built, they race to validate
it as soon as the file lock gets released, causing lots of concurrent
IO.
This patch makes it so that the timestamp is written by the same Clang
instance responsible for building the PCM while still holding the lock.
This makes it so that whenever a PCM file gets compiled, it's never
re-validated in the same build session.
I believe this is as sound as the current scheme. One thing to be aware
of is that there might be a time interval between accessing input file N
and writing the timestamp file, where changes to input files 0..<N would
not result in a rebuild. Since this is the case current scheme too, I'm
not too concerned about that.
I've seen this speed up `clang-scan-deps` by ~27%.
This patch adds an IsText parameter to the following getBufferForFile,
getBufferForFileImpl. We introduce a new virtual function
openFileForReadBinary which defaults to openFileForRead except in
RealFileSystem which uses the OF_None flag instead of OF_Text.
The default is set to OF_Text instead of OF_None, this change in value
does not affect any other platforms other than z/OS. Setting this
parameter correctly is required to open files on z/OS in the correct
encoding. The IsText parameter is based on the context of where we open
files, for example, in the ASTReader, HeaderMap requires that files
always be opened in binary even though they might be tagged as text.
The 'vector' clause specifies the iterations to be executed in vector or
SIMD mode. There are some limitations on which associated compute
contexts may be associated with this and have arguments, but otherwise
this is a fairly unrestricted clause.
It DOES have region limits like 'gang' and 'worker'.
The worker clause specifies iterations of the loop/ that are executed in
parallel by distributing the iterations among the multiple works within
a single gang.
The sema rules for this type are simply that it cannot be combined with
a `kernel` construct with a `num_workers` clause, child `loop` clauses
cannot contain a `gang` or `worker` clause, and that the argument is oly
allowed when associated with a `kernel`.
This patch reapplies #111173, fixing a bug when instantiating dependent
expressions that name a member template that is later explicitly
specialized for a class specialization that is implicitly instantiated.
The bug is addressed by adding the `hasMemberSpecialization` function,
which return `true` if _any_ redeclaration is a member specialization.
This is then used when determining the instantiation pattern for a
specialization of a template, and when collecting template arguments for
a specialization of a template.
The 'gang' clause is used to specify parallel execution of loops, thus
has some complicated rules depending on the 'loop's associated compute
construct. This patch implements all of those.
Add the permutation clause for the interchange directive which will be
introduced in the upcoming OpenMP 6.0 specification. A preview has been
published in
[Technical Report12](https://www.openmp.org/wp-content/uploads/openmp-TR12.pdf).
This fixes instantiation of definition for friend function templates,
when the declaration found and the one containing the definition
have different template contexts.
In these cases, the the function declaration corresponding to the
definition is not available; it may not even be instantiated at all.
So this patch adds a bit which tracks which function template
declaration was instantiated from the member template.
It's used to find which primary template serves as a context
for the purpose of obtaining the template arguments needed
to instantiate the definition.
Fixes#55509
Reapplies #106585, fixing an issue where non-dependent names of member
templates appearing prior to that member template being explicitly
specialized for an implicitly instantiated class template specialization
would incorrectly use the definition of the explicitly specialized
member template.
The 'tile' clause shares quite a bit of the rules with 'collapse', so a
followup patch will add those tests/behaviors. This patch deals with
adding the AST node.
The 'tile' clause takes a series of integer constant expressions, or *.
The asterisk is now represented by a new OpenACCAsteriskSizeExpr node,
else this clause is very similar to others.
- In Sema, when encountering Decls with function effects needing
verification, add them to a vector, DeclsWithEffectsToVerify.
- Update AST serialization to include DeclsWithEffectsToVerify.
- In AnalysisBasedWarnings, use DeclsWithEffectsToVerify as a work
queue, verifying functions with declared effects, and inferring (when
permitted and necessary) whether their callees have effects.
---------
Co-authored-by: Doug Wyatt <dwyatt@apple.com>
Co-authored-by: Sirraide <aeternalmail@gmail.com>
Co-authored-by: Erich Keane <ekeane@nvidia.com>
The 'collapse' clause on a 'loop' construct is used to specify how many
nested loops are associated with the 'loop' construct. It takes an
optional 'force' tag, and an integer constant expression as arguments.
There are many other restrictions based on the contents of the loop/etc,
but those are implemented in followup patches, for now, this patch just
adds the AST node and does basic argument checking on the loop-count.
Fixed a crash for the attached test case due to we missed to emit the
deduction guide. The reason is, the deduction guide is attached to the
export-decl in the imported module. So we won't emit it by traversing the
AST of the current TU.
Summary:
https://github.com/llvm/llvm-project/pull/109167 serializes
FunctionToLambdasMap in the order of pointers in DenseMap. It gives
different order with different memory layouts. Fix this issue by using
LocalDeclID instead of pointers.
Test Plan: check-clang
This reverts commit e39205654dc11c50bd117e8ccac243a641ebd71f.
There are further discussions in
https://github.com/llvm/llvm-project/pull/70976, happening for past two
weeks. Since there were no responses for couple weeks now, reverting
until author is back.
Some `FileManager` APIs still return `{File,Directory}Entry` instead of
the preferred `{File,Directory}EntryRef`. These are documented to be
deprecated, but don't have the attribute that warns on their usage. This
PR marks them as such with `LLVM_DEPRECATED()` and replaces their usage
with the recommended counterparts. NFCI.