This patch reapplies #114258, fixing an infinite recursion bug in
`ASTImporter` that occurs when importing the primary template of a class
template specialization when the latest redeclaration of that template
is a friend declaration in the primary template.
The reduction clause has some minor restrictions on the variable
references that are implementable, so this implements those. Others
require reachability analysis, so this patch documents that we're not
going to do that in the CFE(or at least save it for the MLIR passes).
This PR introduces alpha.webkit.UncheckedLocalVarsChecker which detects
a raw reference or a raw pointer local, static, or global variable to a
CheckedPtr capable object without a guardian variable in an outer scope.
In https://github.com/llvm/llvm-project/pull/109837, it sets a global
variable(`PGOInstrumentColdFunctionOnly`) in PassBuilderPipelines.cpp
which introduced a data race detected by TSan. To fix this, I decouple
the flag setting, the flags are now set
separately(`instrument-cold-function-only-path` is required to be used
with `--pgo-instrument-cold-function-only`).
This PR removes the `HeaderFileInfo::Framework` member and reduces the
size of this data type from 32B to 16B. This should improve Clang's
memory usage in situations where it keeps track of lots of header files.
NFCI. Depends on #114459.
This PR removes the `-index-header-map` functionality from Clang. AFAIK
this was only used internally at Apple and is now dead code. The main
motivation behind this change is to enable the removal of
`HeaderFileInfo::Framework` member and reducing the size of that data
structure.
rdar://84036149
When reading a path from a bitstream blob, `ASTReader` performs up to
three allocations:
1. Conversion of the `StringRef` blob into `std::string` to conform to
the `ResolveImportedPath()` API that takes `std::string &`.
2. Concatenation of the module file prefix directory and the relative
path into a fresh `SmallString<128>` buffer in `ResolveImportedPath()`.
3. Propagating the result out of `ResolveImportedPath()` by calling
`std::string::assign()` on the out-parameter.
This patch makes is so that we avoid allocations altogether (amortized)
by:
1. Avoiding conversion of the `StringRef` blob into `std::string` and
changing the `ResolveImportedPath()` API.
2. Using one "global" buffer to hold the concatenation.
3. Returning `StringRef` that points into the buffer and ensuring the
contents are not overwritten while it lives.
Note that in some places of the bitstream we don't store paths as blobs,
but rather as records that get VBR-encoded. This makes the allocation in
(1) unavoidable. I plan to fix this in a follow-up PR by changing the
PCM format.
Moreover, there are some data structures (e.g.
`serialization::InputFileInfo`) that store deserialized and resolved
paths as `std::string`. If we don't access them frequently, it would be
more efficient to store just the unresolved `StringRef` and resolve them
on demand (within some kind of shared buffer to prevent allocations).
This PR alone improves `clang-scan-deps` performance on my workload by
3.6%.
This PR is one of the many PRs in the SYCL upstreaming effort focusing
on device code linking during the SYCL offload compilation process. RFC:
https://discourse.llvm.org/t/rfc-offloading-design-for-sycl-offload-kind-and-spir-targets/74088
In this PR, we introduce a new tool that will be used to perform device
code linking for SYCL offload kind. It accepts SYCL device objects in
LLVM IR bitcode format and will generate a fully linked device object
that can then be wrapped and linked into the host object.
A primary use case for this tool is to perform device code linking for
objects with SYCL offload kind inside the clang-linker-wrapper. It can
also be invoked via clang driver as follows:
`clang --target=spirv64 --sycl-link input.bc`
Device code linking for SYCL offloading kind has a number of known
quirks that makes it difficult to use in a unified offloading setting.
Two of the primary issues are:
1. Several finalization steps are required to be run on the fully-linked
LLVM IR bitcode to gaurantee conformance to SYCL standards. This step is
unique to SYCL offloading compilation flow.
2. SPIR-V LLVM Translator tool is an extenal tool and hence SPIR-V IR
code generation cannot be done as part of LTO. This limitation will be
lifted once SPIR-V backend is available as a viable LLVM backend.
Hence, we introduce this new tool to provide a clean wrapper to perform
SYCL device linking.
Co-Author: Michael Toguchi
Thanks
---------
Signed-off-by: Arvind Sudarsanam <arvind.sudarsanam@intel.com>
Swift ClangImporter now supports concurrency annotations on imported
declarations and their parameters/results, to make it possible to use
imported APIs in Swift safely there has to be a way to annotate
individual parameters and result types with relevant attributes that
indicate that e.g. a block is called on a particular actor or it accepts
a `Sendable` parameter.
To faciliate that `SwiftAttr` is switched from `InheritableAttr` which
is a declaration attribute to `DeclOrTypeAttr`. To support this
attribute in type context we need access to its "Attribute" argument
which requires `AttributedType` to be extended to include `Attr *` when
available instead of just `attr::Kind` otherwise it won't be possible to
determine what attribute should be imported.
Before this change, ParseObjc would call the closing
`MaybeParseAttributes` before it had created Objective-C `ParmVarDecl`
objects (and associated name lookup entries), meaning that you could not
reference Objective-C method parameters in
`__attribute__((diagnose_if))`. This change moves the creation of the
`ParmVarDecl` objects ahead of calling `Sema::ActOnMethodDeclaration` so
that `MaybeParseAttributes` can find them. This is already how it works
for C parameters hanging off of the selector.
This change alone is insufficient to enable `diagnose_if` for
Objective-C methods and effectively is NFC. There will be a follow-up PR
for diagnose_if. This change is still useful for any other work that may
need attributes to reference Objective-C parameters.
rdar://138596211
Fix `isInstantiated` and `isInTemplateInstantiation` matchers, so they
return true for instantiations of variable templates, and any
declaration in statements contained in such instantiations.
This patch fixes a couple of regressions introduced in #111852.
Consider:
```
template<typename T>
struct A
{
template<bool U>
static constexpr bool f() requires U
{
return true;
}
};
template<>
template<bool U>
constexpr bool A<short>::f() requires U
{
return A<long>::f<U>();
}
template<>
template<bool U>
constexpr bool A<long>::f() requires U
{
return true;
}
static_assert(A<short>::f<true>()); // crash here
```
This crashes because when collecting template arguments from the _first_
declaration of `A<long>::f<true>` for constraint checking, we don't add
the template arguments from the enclosing class template specialization
because there exists another redeclaration that is a member
specialization.
This also fixes the following example, which happens for a similar
reason:
```
// input.cppm
export module input;
export template<int N>
constexpr int f();
template<int N>
struct A {
template<int J>
friend constexpr int f();
};
template struct A<0>;
template<int N>
constexpr int f() {
return N;
}
```
```
// input.cpp
import input;
static_assert(f<1>() == 1); // error: static assertion failed
```
This helps to produce USRs for custom LangOpts - that differ from the
one associated with the given Decl. This can unlock usecases in tooling
opportunities that we have downstream.
This is NFC because existing calls will still result in the right
overload, thus using the LangOpts associated with the ASTContext of the
Decls and Types.
This starts moving `X86Builtins.def` to be a tablegen file. It's quite
large, so I think it'd be good to move things in multiple steps to avoid
a bunch of merge conflicts due to the amount of time this takes to
complete.
Currently, we store injected template arguments in
`RedeclarableTemplateDecl::CommonBase`. This approach has a couple
problems:
1. We can only access the injected template arguments of
`RedeclarableTemplateDecl` derived types, but other `Decl` kinds still
make use of the injected arguments (e.g.
`ClassTemplatePartialSpecializationDecl`,
`VarTemplatePartialSpecializationDecl`, and `TemplateTemplateParmDecl`).
2. Accessing the injected template arguments requires the common data
structure to be allocated. This may occur before we determine whether a
previous declaration exists (e.g. when comparing constraints), so if the
template _is_ a redeclaration, we end up discarding the common data
structure.
This patch moves the storage and access of injected template arguments
from `RedeclarableTemplateDecl` to `TemplateParameterList`.
This implements a warning that's similar to what GCC does in that
context: both memcpy and memset require their first and second operand
to be trivially copyable, let's warn if that's not the case.
Instead of changing the return type of `ModuleMap::findOrCreateModule`, this patch adds a counterpart that only returns `Module *` and thus has the same signature as `createModule()`, which is important in `ASTReader`.
Previously, we covered returning refs, or copies of optional, and bools.
Now cover returning pointers (to any type).
This is useful for cases like operator-> of smart pointers.
Addresses more of issue llvm#58510
This patch avoids eagerly populating the submodule index on `Module`
construction. The `StringMap` allocation shows up in my profiles of
`clang-scan-deps`, while the index is not necessary most of the time. We
still construct it on-demand.
Moreover, this patch avoids performing qualified submodule lookup in
`ASTReader` whenever we're serializing a module graph whose top-level
module is unknown. This is pointless, since that's guaranteed to never
find any existing submodules anyway.
This speeds up `clang-scan-deps` by ~0.5% on my workload.
With inferred modules, the dependency scanner takes care to replace the
fake "__inferred_module.map" path with the file that allowed the module
to be inferred. However, this only worked when such a module was
imported directly in the TU. Whenever such module got loaded
transitively, the scanner would fail to perform the replacement. This is
caused by the fact that PCM files are lossy and drop this information.
This patch makes sure that PCMs include this file for each submodule (in
the `SUBMODULE_DEFINITION` record), fixes one existing test with an
incorrect assertion, and does a little drive-by refactoring of
`ModuleMap`.
This patch adds support for cold function coverage instrumentation based
on sampling PGO counts. The major motivation is to detect dead functions
for the services that are optimized with sampling PGO. If a function is
covered by sampling profile count (e.g., those with an entry count > 0),
we choose to skip instrumenting those functions, which significantly
reduces the instrumentation overhead.
More details about the implementation and flags:
- Added a flag `--pgo-instrument-cold-function-only` in
`PGOInstrumentation.cpp` as the main switch to control skipping the
instrumentation.
- Built the extra instrumentation passes(a bundle of passes in
`addPGOInstrPasses`) under sampling PGO pipeline. This is controlled by
`--instrument-cold-function-only-path` flag.
- Added a driver flag `-fprofile-generate-cold-function-coverage`:
- 1) Config the flags in one place, i,e. adding
`--instrument-cold-function-only-path=<...>` and
`--pgo-function-entry-coverage`. Note that the instrumentation file path
is passed through `--instrument-sample-cold-function-path`, because we
cannot use the `PGOOptions.ProfileFile` as it's already used by
`-fprofile-sample-use=<...>`.
- 2) makes linker to link `compiler_rt.profile` lib(see
[ToolChain.cpp#L1125-L1131](https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChain.cpp#L1125-L1131)
).
- Added a flag(`--pgo-cold-instrument-entry-threshold`) to config entry
count to determine cold function.
Overall, the full command is like:
```
clang++ -O2 -fprofile-generate-cold-function-coverage=<...> -fprofile-sample-use=<...> code.cc -o code
```
Pure Scalable Types are defined in AAPCS64 here:
https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#pure-scalable-types-psts
And should be passed according to Rule C.7 here:
https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#682parameter-passing-rules
This part of the ABI is completely unimplemented in Clang, instead it
treats PSTs sometimes as HFAs/HVAs, sometime as general composite types.
This patch implements the rules for passing PSTs by employing the
`CoerceAndExpand` method and extending it to:
* allow array types in the `coerceToType`; Now only `[N x i8]` are
considered padding.
* allow mismatch between the elements of the `coerceToType` and the
elements of the `unpaddedCoerceToType`; AArch64 uses this to map
fixed-length vector types to SVE vector types.
Corectly passing a PST argument needs a decision in Clang about whether
to pass it in memory or registers or, equivalently, whether to use the
`Indirect` or `Expand/CoerceAndExpand` method. It was considered
relatively harder (or not practically possible) to make that decision in
the AArch64 backend.
Hence this patch implements the register counting from AAPCS64 (cf.
`NSRN`, `NPRN`) to guide the Clang's decision.
Summary:
The C library for GPUs provides the ability to target regular C/C++
programs by providing the C library and a file containing kernels that
call the `main` function. This is mostly used for unit tests, this patch
provides a quick way to add them without needing to know the paths. I
currently do this explicitly, but according to the libc++ contributors
we don't want to need to specify these paths manually. See the
discussion in https://github.com/llvm/llvm-project/pull/104515.
I just default to `lib/` if the target-specific one isn't found because
the linker will handle giving a reasonable error message if it's not
found. Basically the use-case looks like this.
```console
$ clang test.c --target=amdgcn-amd-amdhsa -mcpu=native -startfiles -stdlib
$ amdhsa-loader a.out
PASS!
```
This change moves the `alpha.nondeterministic.PointerSorting` and
`alpha.nondeterministic.PointerIteration` static analyzer checkers to a
single `clang-tidy` check. Those checkers were implemented as simple
`clang-tidy` check-like code, wrapped in the static analyzer framework.
The documentation was updated to describe what the checks can and cannot
do, and testing was completed on a broad set of open-source projects.
Co-authored-by: Vince Bridgers <vince.a.bridgers@ericsson.com>
I noticed that some PCM files contain `HeaderFileInfo` for headers only
included in a dependent PCM file, which is wasteful.
This patch changes the logic to only write headers that are included
locally. This makes the PCM files smaller and saves some superfluous
deserialization of `HeaderFileInfo` triggered by
`Preprocessor::alreadyIncluded()`.
This patch shrinks the size of the `Module` class from 2112B to 1624B. I
wasn't able to get a good data on the actual impact on memory usage, but
given my `clang-scan-deps` workload at hand (with tens of thousands of
instances), I think there should be some win here. This also speeds up
my benchmark by under 0.1%.
ARM ACLE PR#323[1] adds new modal types for 8-bit floating point
intrinsic.
From the PR#323:
```
ACLE defines the `__mfp8` type, which can be used for the E5M2 and E4M3
8-bit floating-point formats. It is a storage and interchange only type
with no arithmetic operations other than intrinsic calls.
````
The type should be an opaque type and its format in undefined in Clang.
Only defined in the backend by a status/format register, for AArch64 the
FPMR.
This patch is an attempt to the add the mfloat8_t scalar type. It has a
parser and codegen for the new scalar type.
The patch it is lowering to and 8bit unsigned as it has no format. But
maybe we should add another opaque type.
[1] https://github.com/ARM-software/acle/pull/323
This fixes remaining issues in my previous PR #90959.
Changes:
- Removed dependency on LLVM header in `xray_interface.cpp`
- Fixed XRay patching for some targets due to missing changes in
architecture-specific patching functions
- Addressed some remaining compiler warnings that I missed in the
previous patch
- Formatting
I have tested these changes on `x86_64` (natively), as well as
`ppc64le`, `aarch64` and `arm32` (cross-compiled and emulated using
qemu).
**Original description:**
This PR introduces shared library (DSO) support for XRay based on a
revised version of the implementation outlined in [this
RFC](https://discourse.llvm.org/t/rfc-upstreaming-dso-instrumentation-support-for-xray/73000).
The feature enables the patching and handling of events from DSOs,
supporting both libraries linked at startup or explicitly loaded, e.g.
via `dlopen`.
This patch adds the following:
- The `-fxray-shared` flag to enable the feature (turned off by default)
- A small runtime library that is linked into every instrumented DSO,
providing position-independent trampolines and code to register with the
main XRay runtime
- Changes to the XRay runtime to support management and patching of
multiple objects
These changes are fully backward compatible, i.e. running without
instrumented DSOs will produce identical traces (in terms of recorded
function IDs) to the previous implementation.
Due to my limited ability to test on other architectures, this feature
is only implemented and tested with x86_64. Extending support to other
architectures is fairly straightforward, requiring only a
position-independent implementation of the architecture-specific
trampoline implementation (see
`compiler-rt/lib/xray/xray_trampoline_x86_64.S` for reference).
This patch does not include any functionality to resolve function IDs
from DSOs for the provided logging/tracing modes. These modes still work
and will record calls from DSOs, but symbol resolution for these
functions in not available. Getting this to work properly requires
recording information about the loaded DSOs and should IMO be discussed
in a separate RFC, as there are mulitple feasible approaches.
---------
Co-authored-by: Sebastian Kreutzer <sebastian.kreutzer@tu-darmstadt.de>
nsw is now added to do-variable increment when -fno-wrapv is enabled as
GFortran seems to do.
That means the option introduced by #91579 isn't necessary any more.
Note that the feature of -flang-experimental-integer-overflow is enabled
by default.