This patch is part of the upstreaming effort for supporting SYCL
language front end.
It makes the following changes:
1. Adds sycl_external attribute for functions with external linkage,
which is intended for use to implement the SYCL_EXTERNAL macro as
specified by the SYCL 2020 specification
2. Adds checks to avoid emitting device code when sycl_external and
sycl_kernel_entry_point attributes are not enabled
3. Fixes test failures caused by the above changes
This patch is missing diagnostics for the following diagnostics listed
in the SYCL 2020 specification's section 5.10.1, which will be addressed
in a subsequent PR:
Functions that are declared using SYCL_EXTERNAL have the following
additional restrictions beyond those imposed on other device functions:
1. If the SYCL backend does not support the generic address space then
the function cannot use raw pointers as parameter or return types.
Explicit pointer classes must be used instead;
2. The function cannot call group::parallel_for_work_item;
3. The function cannot be called from a parallel_for_work_group scope.
In addition to that, the subsequent PR will also implement diagnostics
for inline functions including virtual functions defined as inline.
---------
Co-authored-by: Mariya Podchishchaeva <mariya.podchishchaeva@intel.com>
The 'cfi_salt' attribute specifies a string literal that is used as a
"salt" for Control-Flow Integrity (CFI) checks to distinguish between
functions with the same type signature. This attribute can be applied
to function declarations, function definitions, and function pointer
typedefs.
This attribute prevents function pointers from being replaced with
pointers to functions that have a compatible type, which can be a CFI
bypass vector.
The attribute affects type compatibility during compilation and CFI
hash generation during code generation.
Attribute syntax: [[clang::cfi_salt("<salt_string>")]]
GNU-style syntax: __attribute__((cfi_salt("<salt_string>")))
- The attribute takes a single string of non-NULL ASCII characters.
- It only applies to function types; using it on a non-function type
will generate an error.
- All function declarations and the function definition must include
the attribute and use identical salt values.
Example usage:
// Header file:
#define __cfi_salt(S) __attribute__((cfi_salt(S)))
// Convenient typedefs to avoid nested declarator syntax.
typedef int (*fp_unsalted_t)(void);
typedef int (*fp_salted_t)(void) __cfi_salt("pepper");
struct widget_ops {
fp_unsalted_t init; // Regular CFI.
fp_salted_t exec; // Salted CFI.
fp_unsalted_t teardown; // Regular CFI.
};
// bar.c file:
static int bar_init(void) { ... }
static int bar_salted_exec(void) __cfi_salt("pepper") { ... }
static int bar_teardown(void) { ... }
static struct widget_generator _generator = {
.init = bar_init,
.exec = bar_salted_exec,
.teardown = bar_teardown,
};
struct widget_generator *widget_gen = _generator;
// 2nd .c file:
int generate_a_widget(void) {
int ret;
// Called with non-salted CFI.
ret = widget_gen.init();
if (ret)
return ret;
// Called with salted CFI.
ret = widget_gen.exec();
if (ret)
return ret;
// Called with non-salted CFI.
return widget_gen.teardown();
}
Link: https://github.com/ClangBuiltLinux/linux/issues/1736
Link: https://github.com/KSPP/linux/issues/365
---------
Signed-off-by: Bill Wendling <morbo@google.com>
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
This is a major change on how we represent nested name qualifications in
the AST.
* The nested name specifier itself and how it's stored is changed. The
prefixes for types are handled within the type hierarchy, which makes
canonicalization for them super cheap, no memory allocation required.
Also translating a type into nested name specifier form becomes a no-op.
An identifier is stored as a DependentNameType. The nested name
specifier gains a lightweight handle class, to be used instead of
passing around pointers, which is similar to what is implemented for
TemplateName. There is still one free bit available, and this handle can
be used within a PointerUnion and PointerIntPair, which should keep
bit-packing aficionados happy.
* The ElaboratedType node is removed, all type nodes in which it could
previously apply to can now store the elaborated keyword and name
qualifier, tail allocating when present.
* TagTypes can now point to the exact declaration found when producing
these, as opposed to the previous situation of there only existing one
TagType per entity. This increases the amount of type sugar retained,
and can have several applications, for example in tracking module
ownership, and other tools which care about source file origins, such as
IWYU. These TagTypes are lazily allocated, in order to limit the
increase in AST size.
This patch offers a great performance benefit.
It greatly improves compilation time for
[stdexec](https://github.com/NVIDIA/stdexec). For one datapoint, for
`test_on2.cpp` in that project, which is the slowest compiling test,
this patch improves `-c` compilation time by about 7.2%, with the
`-fsyntax-only` improvement being at ~12%.
This has great results on compile-time-tracker as well:

This patch also further enables other optimziations in the future, and
will reduce the performance impact of template specialization resugaring
when that lands.
It has some other miscelaneous drive-by fixes.
About the review: Yes the patch is huge, sorry about that. Part of the
reason is that I started by the nested name specifier part, before the
ElaboratedType part, but that had a huge performance downside, as
ElaboratedType is a big performance hog. I didn't have the steam to go
back and change the patch after the fact.
There is also a lot of internal API changes, and it made sense to remove
ElaboratedType in one go, versus removing it from one type at a time, as
that would present much more churn to the users. Also, the nested name
specifier having a different API avoids missing changes related to how
prefixes work now, which could make existing code compile but not work.
How to review: The important changes are all in
`clang/include/clang/AST` and `clang/lib/AST`, with also important
changes in `clang/lib/Sema/TreeTransform.h`.
The rest and bulk of the changes are mostly consequences of the changes
in API.
PS: TagType::getDecl is renamed to `getOriginalDecl` in this patch, just
for easier to rebasing. I plan to rename it back after this lands.
Fixes#136624
Fixes https://github.com/llvm/llvm-project/issues/43179
Fixes https://github.com/llvm/llvm-project/issues/68670
Fixes https://github.com/llvm/llvm-project/issues/92757
As reported in #138173, enabling `-ftime-report` adds pass timing info
to the stats file if `-stats-file` is specified. This was determined to
be WAI. However, if one intentionally wants to put timer information in
the stats file, using `-ftime-report` may lead to a lot of logspam (that
can't be removed by directing stderr to `/dev/null` as that would also
redirect compiler errors). To address this, this PR adds a flag
`-stats-file-timers` that adds timer data to the stats file without
outputting to stderr.
This option is similar to -Wuninitialized-const-reference, but diagnoses
the passing of an uninitialized value via a const pointer, like in the
following code:
```
void foo(const int *);
void test() {
int v;
foo(&v);
}
```
This is an extract from #147221 as suggested in [this
comment](https://github.com/llvm/llvm-project/pull/147221#discussion_r2190998730).
This patch adds support for -mcpu=gb10 (NVIDIA GB10). This is a
big.LITTLE cluster of Cortex-X925 and Cortex-A725 cores. The appropriate
MIDR numbers are added to detect them in -mcpu=native.
We did not add an -mcpu=cortex-x925.cortex-a725 option because GB10 does
include the crypto instructions which we want on by default, and the
current convention is to not enable such extensions for Arm Cortex cores
in -mcpu where they are optional in the IP.
Relevant GCC patch:
https://gcc.gnu.org/pipermail/gcc-patches/2025-June/687005.html
I don't see any reason this shouldn't be allowed. AFAICT this is only
disabled due to the heuristics used to determine whether it makes sense
to allow the use of an attribute with `#pragma clang attribute`.
This allows libc++ to drop `_LIBCPP_HIDE_FROM_ABI` in a lot of places,
making the library significantly easier to read.
Extend `isAllocSiteRemovable` to be able to check if the ModRef info
indicates the alloca is only Ref or only Mod, and be able to remove it
accordingly. It seemed that there were a surprising number of
benchmarks with this pattern which weren't getting optimized previously
(due to MemorySSA walk limits). There were somewhat more existing tests
than I'd like to have modified which were simply doing exactly this
pattern (and thus relying on undef memory). Claude code contributed the
new tests (and found an important typo that I'd made).
This implements the discussion in
https://github.com/llvm/llvm-project/pull/143782#discussion_r2142720376.
1. Added the missing XMEGA2 definition. The avr64 devices use xmega2 which has SPM(X) defined.
2. The avr16/avr32 devices do have SPM and SPMX features, but the current xmega3 definition has not.
Xmega3 is also used for modern attiny series which do not have SPM(X), so that is correct.
Leave the avr16/avr32 devices unchanged (using xmega3 to be in sync with gcc definitions).
Fixes https://github.com/llvm/llvm-project/issues/116116
We have multiple different attributes in clang representing device
kernels for specific targets/languages. Refactor them into one attribute
with different spellings to make it more easily scalable for new
languages/targets.
---------
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
Introduce the `reentrant_capability` attribute, which may be specified
alongside the `capability(..)` attribute to denote that the defined
capability type is reentrant. Marking a capability as reentrant means
that acquiring the same capability multiple times is safe, and does not
produce warnings on attempted re-acquisition.
The most significant changes required are plumbing to propagate if the
attribute is present to a CapabilityExpr, and introducing
ReentrancyDepth to the LockableFactEntry class.
As best as I can see, all NVPTX architectures support the generic
address space.
I note there's a FIXME in the target's address space map about 'generic'
still having to be added to the target but we haven't observed any
issues with it downstream. The generic address space is mapped to the
same target address space as default/private (0), but this isn't
necessarily a problem for users.
This patch adds initial support for the recently announced Armv9
Cortex-A320 processor.
For more information, including the Technical Reference Manual, see:
https://developer.arm.com/Processors/Cortex-A320
---------
Co-authored-by: Oliver Stannard <oliver.stannard@arm.com>
This patch adds a new flag, -ftime-report-json, which outputs the same
information as -ftime-report but as JSON instead of -ftime-report's
pretty printed format.
This introduces three things related to intialization like:
char buf[3] = "foo";
where the array does not declare enough space for the null terminator
but otherwise can represent the array contents exactly.
1) An attribute named 'nonstring' which can be used to mark that a
field or variable is not intended to hold string data.
2) -Wunterminated-string-initialization, which is grouped under
-Wextra, and diagnoses the above construct unless the declaration
uses the 'nonstring' attribute.
3) -Wc++-unterminated-string-initialization, which is grouped under
-Wc++-compat, and diagnoses the above construct even if the
declaration uses the 'nonstring' attribute.
Fixes#137705
This introduces a new diagnostic group to diagnose implicit casts from
int to an enumeration type. In C, this is valid, but it is not
compatible with C++.
Additionally, this moves the "implicit conversion from enum type to
different enum type" diagnostic from `-Wenum-conversion` to a new group
`-Wimplicit-enum-enum-cast`, which is a more accurate home for it.
`-Wimplicit-enum-enum-cast` is also under `-Wimplicit-int-enum-cast`, as
it is the same incompatibility (the enumeration on the right-hand is
promoted to `int`, so it's an int -> enum conversion).
Fixes#37027
The recently announced IBM z17 processor implements the architecture
already supported as "arch15" in LLVM. This patch adds support for "z17"
as an alternate architecture name for arch15.
This patch also add the scheduler description for the z17 processor,
provided by Jonas Paulsson.
This is a follow-up to #132129.
Currently, only `Parser` and `SemaBase` get a `DiagCompat()` helper; I’m
planning to keep refactoring compatibility warnings and add more helpers
to other classes as needed. I also refactored a single parser compat
warning just to make sure everything works properly when diagnostics
across multiple components (i.e. Sema and Parser in this case) are
involved.
This introduces some tablegen helpers for defining compatibility
warnings. The main aim of this is to both simplify adding new
compatibility warnings as well as to unify the naming of compatibility
warnings.
I’ve refactored ~half of the compatiblity warnings (that follow the
usual scheme) in `DiagnosticSemaKinds.td` for illustration purposes and
also to simplify/unify the wording of some of them (I also corrected a
typo in one of them as a drive-by fix).
I haven’t (yet) migrated *all* warnings even in that one file, and there
are some more specialised ones for which the scheme I’ve established
here doesn’t work (e.g. because they’re warning+error instead of
warning+extwarn; however, warning+extension *is* supported), but the
point of this isn’t to implement *all* compatibility-related warnings
this way, only to make the common case a bit easier to handle.
This currently also only handles C++ compatibility warnings, but it
should be fairly straight-forward to extend the tablegen code so it can
also be used for C compatibility warnings (if this gets merged, I’m
planning to do that in a follow-up pr).
The vast majority of compatibility warnings are emitted by writing
```c++
Diag(Loc, getLangOpts().CPlusPlusYZ ? diag::ext_... : diag::warn_...)
```
in accordance with which I’ve chosen the following naming scheme:
```c++
Diag(Loc, getLangOpts().CPlusPlusYZ ? diag::compat_cxxyz_foo : diag::compat_pre_cxxyz_foo)
```
That is, for a warning about a C++20 feature—i.e. C++≤17
compatibility—we get:
```c++
Diag(Loc, getLangOpts().CPlusPlus20 ? diag::compat_cxx20_foo : diag::compat_pre_cxx20_foo)
```
While there is an argument to be made against writing ‘`compat_cxx20`’
here since is technically a case of ‘C++17 compatibility’ and not ‘C++20
compatibility’, I at least find this easier to reason about, because I
can just write the same number 3 times instead of having to use
`ext_cxx20_foo` but `warn_cxx17_foo`. Instead, I like to read this as a
warning about the ‘compatibility *of* a C++20 feature’ rather than
‘*with* C++17’.
I also experimented with moving all compatibility warnings to a separate
file, but 1. I don’t think it’s worth the effort, and 2. I think it
hurts compile times a bit because at least in my testing I felt that I
had to recompile more code than if we just keep e.g. Sema-specific
compat warnings in the Sema diagnostics file.
Instead, I’ve opted to put them all in the same place within any one
file; currently this is a the very top but I don’t really have strong
opinions about this.
This reverts commit 31ebe6647b7f1fc7f6778a5438175b12f82357ae.
The reason for the test failure is likely due to
`Name2PairMap::getTimerGroup(...)` not holding a lock.
Additionally, remove the behavior for both pass manager's timer manager
classes (`PassTimingInfo` for the old pass manager and
`TimePassesHandler` for the new pass manager) where these classes would
print the values of their timers upon destruction.
Currently, each pass manager manages their own `TimerGroup`s. This is
problematic because of duplicate `TimerGroup`s (both pass managers have
a `TimerGroup` for pass times with identical names and descriptions).
The result is that in Clang, `-ftime-report` has two "Pass execution
timing report" sections (one for the new pass manager which manages
optimization passes, and one for the old pass manager which manages the
backend). The result of this change is that Clang's `-ftime-report` now
prints both optimization and backend pass timing info in a unified "Pass
execution timing report" section.
Moving the ownership of the `TimerGroups` to globals also makes it
easier to implement JSON-formatted `-ftime-report`. This was not
possible with the old structure because the two pass managers were
created and destroyed in far parts of the codebase and outputting JSON
requires the printing logic to be at the same place because of
formatting.
Previous discourse discussion:
https://discourse.llvm.org/t/difficulties-with-implementing-json-formatted-ftime-report/84353
Instead of manually adding a note pointing to the relevant template
parameter to every relevant error, which is very easy to miss, this
patch adds a new instantiation context note, so that this can work using
RAII magic.
This fixes a bunch of places where these notes were missing, and is more
future-proof.
Some diagnostics are reworked to make better use of this note:
- Errors about missing template arguments now refer to the parameter
which is missing an argument.
- Template Template parameter mismatches now refer to template
parameters as parameters instead of arguments.
It's likely this will add the note to some diagnostics where the
parameter is not super relevant, but this can be reworked with time and
the decrease in maintenance burden makes up for it.
This bypasses the templight dumper for the new context entry, as the
tests are very hard to update.
This depends on #125453, which is needed to avoid losing the context
note for errors occuring during template argument deduction.
gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.
This PR removes all non-documentation occurrences of gfx940/gfx941 from
the llvm directory, and the remaining occurrences in clang.
Documentation changes will follow.
For SWDEV-512631
gfx940 and gfx941 are no longer supported. This is one of a series of
PRs to remove them from the code base.
This PR removes all occurrences of gfx940/gfx941 from clang that can be
removed without changes in the llvm directory. The
target-invalid-cpu-note/amdgcn.c test is not included here since it
tests a list of targets that is defined in
llvm/lib/TargetParser/TargetParser.cpp.
For SWDEV-512631
apple-a18 is an alias of apple-m4.
apple-s6/s7/s8 are aliases of apple-a13.
apple-s9/s10 are aliases of apple-a16.
As with some other aliases today, this reflects identical ISA feature
support, but not necessarily identical microarchitectures and
performance characteristics.
We add a generic out-of-order CPU model here just like what GCC
has done.
People may use this model to evaluate some optimizations, and more
importantly, people can use this model as a template to customize
their own CPU models.
The design (units, cycles, ...) of this model is random so don't
take it seriously.
In the discussion around #116792, @rjmccall mentioned that ARCMigrate
has been obsoleted and that we could go ahead and remove it from Clang,
so this patch does just that.