This reverts commit 047db150c66e245e9df7db178b893ce6b29820f5.
The original version of the commit caused assertion failures in DSE.
Those were fixed in ec059d81aafedb253a02d6f490ad9b9747611038, so trying
to reland this again.
This helps to clean up any dead stores that come up at the end of the
destructor. The motivating example was a refactoring in libc++'s
basic_string implementation in 8dae17be2991cd7f0d7fd9aa5aecd064520a14f6
that added a zeroing store into the destructor, causing a large
performance regression on an internal workload. We also saw a ~0.2%
performance increase on an internal server workload when enabling this.
I also tested this against all of the non-flaky tests in our large C++
codebase and found a minimal number of issues that all happened to be in
user code.
The conversion code always ended up just getting the type of Src from
the Src argument itself, with no virtual users of this, so there is no
point in also providing this API hook. Fix the documentation as well,
since it seems DestAddr must have been similarly removed at some point
in the past from the API but was still documented.
Also fixes CIR to actually return the casted value!
Convert "denormal-fp-math" and "denormal-fp-math-f32" into a first
class denormal_fpenv attribute. Previously the query for the effective
denormal mode involved two string attribute queries with parsing. I'm
introducing more uses of this, so it makes sense to convert this
to a more efficient encoding. The old representation was also awkward
since it was split across two separate attributes. The new encoding
just stores the default and float modes as bitfields, largely avoiding
the need to consider if the other mode is set.
The syntax in the common cases looks like this:
`denormal_fpenv(preservesign,preservesign)`
`denormal_fpenv(float: preservesign,preservesign)`
`denormal_fpenv(dynamic,dynamic float: preservesign,preservesign)`
I wasn't sure about reusing the float type name instead of adding a
new keyword. It's parsed as a type but only accepts float. I'm also
debating switching the name to subnormal to match the current
preferred IEEE terminology (also used by nofpclass and other
contexts).
This has a behavior change when using the command flag debug
options to set the denormal mode. The behavior of the flag
ignored functions with an explicit attribute set, per
the default and f32 version. Now that these are one attribute,
the flag logic can't distinguish which of the two components
were explicitly set on the function. Only one test appeared to
rely on this behavior, so I just avoided using the flags in it.
This also does not perform all the code cleanups this enables.
In particular the attributor handling could be cleaned up.
I also guessed at how to support this in MLIR. I followed
MemoryEffects as a reference; it appears bitfields are expanded
into arguments to attributes, so the representation there is
a bit uglier with the 2 2-element fields flattened into 4 arguments.
This patch makes the dead_on_return parameter attribute optionally require a number
of bytes to be passed in to specify the number of bytes known to be dead
upon function return/unwind. This is aimed at enabling annotating the
this pointer in C++ destructors with dead_on_return in clang. We need
this to handle cases like the following:
```
struct X {
int n;
~X() {
this[n].n = 0;
}
};
void f() {
X xs[] = {42, -1};
}
```
Where we only certain that sizeof(X) bytes are dead upon return of ~X.
Otherwise DSE would be able to eliminate the store in ~X which would not
be correct.
This patch only does the wiring within IR. Future patches will make
clang emit correct sizing information and update DSE to only delete
stores to objects marked dead_on_return that are provably in bounds of
the number of bytes specified to be dead_on_return.
Reviewers: nikic, alinas, antoniofrighetto
Pull Request: https://github.com/llvm/llvm-project/pull/171712
Since 8c4950951269ec58296afbeba14e99aef467f84d,
getCanonicalTypeUnqualified() calls getUnqualifiedType(), so there's no
point in calling that again on its return value.
We have several issues describing suboptimal stack usage related to the
lifetimes of temporary objects, such as #68747, #43598, and #109204.
Previously, https://reviews.llvm.org/D74094 tried to address this. In
that review, a few issues were brought up, particularly a concern about
the lifetimes of the temporaries needing to be extended to end of the
full expression. While there are arguably more optimal lifetime bounds
we could enforce, for now we can conservatively make them extend to the
end of the full expression, and later refine the optimization to use
tighter bounds (or perhaps a better mechanism in the middle end?).
Fixes#68747
Co-authored-by: Nick Desaulniers <nick.desaulniers@gmail.com>
Co-authored-by: Erik Pilkington <erik.pilkington@gmail.com>
---------
Co-authored-by: Nick Desaulniers <nick.desaulniers@gmail.com>
Co-authored-by: Erik Pilkington <erik.pilkington@gmail.com>
This provides a C language `modular_format` attribute. This combines
with information from the existing `format` to set the new IR
`modular-format` attribute.
The purpose of these attributes is to enable "modular printf". A
statically linked libc can provide a modular variant of printf that only
weakly references implementation routines. The link-time symbol `printf`
would strongly reference aspect symbols (e.g. for float, fixed point,
etc.) that are provided by those routines, restoring the status quo.
However, the compiler could transform calls with constant format strings
to calls to the modular printf instead, and at the same time, it would
emit strong references to the aspect symbols that are needed to
implement the format string. Then, the printf implementation would
contain only the union of the aspects requested.
See issue #146159 for context.
This patch adds a new `FramePointerKind::NonLeafNoReserve` and makes it
the default for `-momit-leaf-frame-pointer`.
It also adds a new commandline option `-m[no-]reserve-frame-pointer-reg`.
This should fix#154379, the main impact of this patch can be found in
`clang/lib/Driver/ToolChains/CommonArgs.cpp`.
`DISubprogram`s are attached to call sites to support various debug info
features, including entry values and tail calls. Clang 9.0
(0f6516856670a435461f56a9faeb4aa8a35a6679) was the first version to
include this kind of call site `DISubprogram` attachment.
This earlier work appears to visit only some call site variants,
however. The call site attachment was added to a higher-level `EmitCall`
path in Clang's code gen that is only used by some call variants. In
particular, some C++ member calls use a different code gen path, which
did not include this call site attachment step, and thus the debug info
it triggers (e.g. call site entries) was not emitted for such calls.
This moves `DISubprogram` attachment to a lower-level call emission path
that is used by all call variants.
Fixes https://github.com/llvm/llvm-project/issues/161962
Create and add generalized type identifier metadata to indirect calls,
and to functions which are potential indirect call targets.
The functions carry the !type metadata. The indirect callsites carry a
list of !type metadata values under !callee_type metadata.
RFC:
https://discourse.llvm.org/t/rfc-call-graph-information-from-clang-llvm-for-c-c/88255
This rename was made as part of
https://github.com/llvm/llvm-project/pull/147835 in order to ease
rebasing the PR, and give a nice window for other patches to get rebased
as well.
It has been a while already, so lets go ahead and rename it back.
When arg memory effects are lost due to indirect arguments, apply
readonly/readnone attribute to the other pointer arguments of the
function.
Fixes#157693 .
In SPIR-V, kernel arguments are not allowed to be in the Generic AS, in
both Intel's internal SPIR-V offloading implementation as well as
HIPSPV, `CrossWorkgroup` AS1 is used. Do the same for OMPSPV.
Currently with Generic AS the `llvm-spirv` translator blows up if we are
using it, and if not, the GPU runtime blows up.
To get the existing logic to set the correct AS to kick in, we need to
know if the function is a kernel or not at the time we first create the
function that may end up as the kernel.
I use the existing `arrangeSYCLKernelCallerDeclaration` function to do
the right kernel ABI computation, but since the function is not specific
to SYCL anymore because I merged all the device kernel clang attributes
into one.
Rename the function to be accurate to the current behavior,
`arrangeDeviceKernelCallerDeclaration`.
---------
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
For structured bindings that use custom `get` specializations, the
resulting LLVM IR ascribes the load of the result of `get` to the
binding's declaration, rather than the place where the binding is
referenced - this caused awkward sequencing in the debug info where,
when stepping through the code you'd step back to the binding
declaration every time there was a reference to the binding.
To fix that - when we cross into IRGening a binding - suppress the debug
info location of that subexpression.
I don't represent this as a great bit of API design - certainly open to
ideas, but putting it out here as a place to start.
It's /possible/ this is an incomplete fix, even - if the binding decl
had other subexpressions, those would still get their location applied &
it'd likely be wrong.
So maybe that's a direction to go with to productionize this - add a new
location scoped device that suppresses any overriding - this might be
more robust. How do people feel about that?
Remove "approx-func-fp-math" attribute and related command line option,
users should always use afn flag in IR.
Resolve FIXME in `TargetMachine::resetTargetOptions` partially.
This changes a bunch of places which use getAs<TagType>, including
derived types, just to obtain the tag definition.
This is preparation for #155028, offloading all the changes that PR used
to introduce which don't depend on any new helpers.
This patch handle struct of fixed vector and struct of array of fixed
vector correctly for VLS calling convention in EmitFunctionProlog,
EmitFunctionEpilog and EmitCall.
stack on: https://github.com/llvm/llvm-project/pull/147173
This is a major change on how we represent nested name qualifications in
the AST.
* The nested name specifier itself and how it's stored is changed. The
prefixes for types are handled within the type hierarchy, which makes
canonicalization for them super cheap, no memory allocation required.
Also translating a type into nested name specifier form becomes a no-op.
An identifier is stored as a DependentNameType. The nested name
specifier gains a lightweight handle class, to be used instead of
passing around pointers, which is similar to what is implemented for
TemplateName. There is still one free bit available, and this handle can
be used within a PointerUnion and PointerIntPair, which should keep
bit-packing aficionados happy.
* The ElaboratedType node is removed, all type nodes in which it could
previously apply to can now store the elaborated keyword and name
qualifier, tail allocating when present.
* TagTypes can now point to the exact declaration found when producing
these, as opposed to the previous situation of there only existing one
TagType per entity. This increases the amount of type sugar retained,
and can have several applications, for example in tracking module
ownership, and other tools which care about source file origins, such as
IWYU. These TagTypes are lazily allocated, in order to limit the
increase in AST size.
This patch offers a great performance benefit.
It greatly improves compilation time for
[stdexec](https://github.com/NVIDIA/stdexec). For one datapoint, for
`test_on2.cpp` in that project, which is the slowest compiling test,
this patch improves `-c` compilation time by about 7.2%, with the
`-fsyntax-only` improvement being at ~12%.
This has great results on compile-time-tracker as well:

This patch also further enables other optimziations in the future, and
will reduce the performance impact of template specialization resugaring
when that lands.
It has some other miscelaneous drive-by fixes.
About the review: Yes the patch is huge, sorry about that. Part of the
reason is that I started by the nested name specifier part, before the
ElaboratedType part, but that had a huge performance downside, as
ElaboratedType is a big performance hog. I didn't have the steam to go
back and change the patch after the fact.
There is also a lot of internal API changes, and it made sense to remove
ElaboratedType in one go, versus removing it from one type at a time, as
that would present much more churn to the users. Also, the nested name
specifier having a different API avoids missing changes related to how
prefixes work now, which could make existing code compile but not work.
How to review: The important changes are all in
`clang/include/clang/AST` and `clang/lib/AST`, with also important
changes in `clang/lib/Sema/TreeTransform.h`.
The rest and bulk of the changes are mostly consequences of the changes
in API.
PS: TagType::getDecl is renamed to `getOriginalDecl` in this patch, just
for easier to rebasing. I plan to rename it back after this lands.
Fixes#136624
Fixes https://github.com/llvm/llvm-project/issues/43179
Fixes https://github.com/llvm/llvm-project/issues/68670
Fixes https://github.com/llvm/llvm-project/issues/92757
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.
This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
Fixes#148063 by preventing the ByVal attribute from being placed on out
and inout function parameters which causes them to be eliminated by the
Dead Store Elimination (DSE) pass.
The checks for the 'z' and 't' format specifiers added in the original
PR #143653 had some issues and were overly strict, causing some build
failures and were consequently reverted at
4c85bf2fe8.
In the latest commit
27c58629ec,
I relaxed the checks for the 'z' and 't' format specifiers, so warnings
are now only issued when they are used with mismatched types.
The original intent of these checks was to diagnose code that assumes
the underlying type of `size_t` is `unsigned` or `unsigned long`, for
example:
```c
printf("%zu", 1ul); // Not portable, but not an error when size_t is unsigned long
```
However, it produced a significant number of false positives. This was
partly because Clang does not treat the `typedef` `size_t` and
`__size_t` as having a common "sugar" type, and partly because a large
amount of existing code either assumes `unsigned` (or `unsigned long`)
is `size_t`, or they define the equivalent of size_t in their own way
(such as
sanitizer_internal_defs.h).2e67dcfdcd/compiler-rt/lib/sanitizer_common/sanitizer_internal_defs.h (L203)
Let Clang emit `dead_on_return` attribute on pointer arguments
that are passed indirectly, namely, large aggregates that the
ABI mandates be passed by value; thus, the parameter is destroyed
within the callee. Writes to such arguments are not observable by
the caller after the callee returns.
This should desirably enable further MemCpyOpt/DSE optimizations.
Previous discussion: https://discourse.llvm.org/t/rfc-add-dead-on-return-attribute/86871.
Including the results of `sizeof`, `sizeof...`, `__datasizeof`,
`__alignof`, `_Alignof`, `alignof`, `_Countof`, `size_t` literals, and
signed `size_t` literals, the results of pointer-pointer subtraction and
checks for standard library functions (and their calls).
The goal is to enable clang and downstream tools such as clangd and
clang-tidy to provide more portable hints and diagnostics.
The previous discussion can be found at #136542.
This PR implements this feature by introducing a new subtype of `Type`
called `PredefinedSugarType`, which was considered appropriate in
discussions. I tried to keep `PredefinedSugarType` simple enough yet not
limited to `size_t` and `ptrdiff_t` so that it can be used for other
purposes. `PredefinedSugarType` wraps a canonical `Type` and provides a
name, conceptually similar to a compiler internal `TypedefType` but
without depending on a `TypedefDecl` or a source file.
Additionally, checks for the `z` and `t` format specifiers in format
strings for `scanf` and `printf` were added. It will precisely match
expressions using `typedef`s or built-in expressions.
The affected tests indicates that it works very well.
Several code require that `SizeType` is canonical, so I kept `SizeType`
to its canonical form.
The failed tests in CI are allowed to fail. See the
[comment](https://github.com/llvm/llvm-project/pull/135386#issuecomment-3049426611)
in another PR #135386.
(This is a re-do of #138972, which had a minor warning in `Clang.cpp`.)
This PR adds some of the support needed for Windows hot-patching.
Windows implements a form of hot-patching. This allows patches to be
applied to Windows apps, drivers, and the kernel, without rebooting or
restarting any of these components. Hot-patching is a complex technology
and requires coordination between the OS, compilers, linkers, and
additional tools.
This PR adds support to Clang and LLVM for part of the hot-patching
process. It enables LLVM to generate the required code changes and to
generate CodeView symbols which identify hot-patched functions. The PR
provides new command-line arguments to Clang which allow developers to
identify the list of functions that need to be hot-patched. This PR also
allows LLVM to directly receive the list of functions to be modified, so
that language front-ends which have not yet been modified (such as Rust)
can still make use of hot-patching.
This PR:
* Adds a `MarkedForWindowsHotPatching` LLVM function attribute. This
attribute indicates that a function should be _hot-patched_. This
generates a new CodeView symbol, `S_HOTPATCHFUNC`, which identifies any
function that has been hot-patched. This attribute also causes accesses
to global variables to be indirected through a `_ref_*` global variable.
This allows hot-patched functions to access the correct version of a
global variable; the hot-patched code needs to access the variable in
the _original_ image, not the patch image.
* Adds a `AllowDirectAccessInHotPatchFunction` LLVM attribute. This
attribute may be placed on global variable declarations. It indicates
that the variable may be safely accessed without the `_ref_*`
indirection.
* Adds two Clang command-line parameters: `-fms-hotpatch-functions-file`
and `-fms-hotpatch-functions-list`. The `-file` flag may point to a text
file, which contains a list of functions to be hot-patched (one function
name per line). The `-list` flag simply directly identifies functions to
be patched, using a comma-separated list. These two command-line
parameters may also be combined; the final set of functions to be
hot-patched is the union of the two sets.
* Adds similar LLVM command-line parameters:
`--ms-hotpatch-functions-file` and `--ms-hotpatch-functions-list`.
* Adds integration tests for both LLVM and Clang.
* Adds support for dumping the new `S_HOTPATCHFUNC` CodeView symbol.
Although the flags are redundant between Clang and LLVM, this allows
additional languages (such as Rust) to take advantage of hot-patching
support before they have been modified to generate the required
attributes.
Credit to @dpaoliello, who wrote the original form of this patch.
This PR adds some of the support needed for Windows hot-patching.
Windows implements a form of hot-patching. This allows patches to be
applied to Windows apps, drivers, and the kernel, without rebooting or
restarting any of these components. Hot-patching is a complex technology
and requires coordination between the OS, compilers, linkers, and
additional tools.
This PR adds support to Clang and LLVM for part of the hot-patching
process. It enables LLVM to generate the required code changes and to
generate CodeView symbols which identify hot-patched functions. The PR
provides new command-line arguments to Clang which allow developers to
identify the list of functions that need to be hot-patched. This PR also
allows LLVM to directly receive the list of functions to be modified, so
that language front-ends which have not yet been modified (such as Rust)
can still make use of hot-patching.
This PR:
* Adds a `MarkedForWindowsHotPatching` LLVM function attribute. This
attribute indicates that a function should be _hot-patched_. This
generates a new CodeView symbol, `S_HOTPATCHFUNC`, which identifies any
function that has been hot-patched. This attribute also causes accesses
to global variables to be indirected through a `_ref_*` global variable.
This allows hot-patched functions to access the correct version of a
global variable; the hot-patched code needs to access the variable in
the _original_ image, not the patch image.
* Adds a `AllowDirectAccessInHotPatchFunction` LLVM attribute. This
attribute may be placed on global variable declarations. It indicates
that the variable may be safely accessed without the `_ref_*`
indirection.
* Adds two Clang command-line parameters: `-fms-hotpatch-functions-file`
and `-fms-hotpatch-functions-list`. The `-file` flag may point to a text
file, which contains a list of functions to be hot-patched (one function
name per line). The `-list` flag simply directly identifies functions to
be patched, using a comma-separated list. These two command-line
parameters may also be combined; the final set of functions to be
hot-patched is the union of the two sets.
* Adds similar LLVM command-line parameters:
`--ms-hotpatch-functions-file` and `--ms-hotpatch-functions-list`.
* Adds integration tests for both LLVM and Clang.
* Adds support for dumping the new `S_HOTPATCHFUNC` CodeView symbol.
Although the flags are redundant between Clang and LLVM, this allows
additional languages (such as Rust) to take advantage of hot-patching
support before they have been modified to generate the required
attributes.
Credit to @dpaoliello, who wrote the original form of this patch.
This extends https://github.com/llvm/llvm-project/pull/138577 to more UBSan checks, by changing SanitizerDebugLocation (formerly SanitizerScope) to add annotations if enabled for the specified ordinals.
Annotations will use the ordinal name if there is exactly one ordinal specified in the SanitizerDebugLocation; otherwise, it will use the handler name.
Updates the tests from https://github.com/llvm/llvm-project/pull/141814.
---------
Co-authored-by: Vitaly Buka <vitalybuka@google.com>
We have multiple different attributes in clang representing device
kernels for specific targets/languages. Refactor them into one attribute
with different spellings to make it more easily scalable for new
languages/targets.
---------
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
This patch is part of a stack that teaches Clang to generate Key Instructions
metadata for C and C++.
When returning a value, stores to the `retval` allocas and branches to `return`
block are put in the same atom group. They are both rank 1, which could in
theory introduce an extra step in some optimized code. This low risk currently
feels an acceptable for keeping the code a bit simpler (as opposed to adding
scaffolding to make the store rank 2).
In the case of a single return (no control flow) the return instruction inherits
the atom group of the branch to the return block when the blocks get folded
togather.
RFC:
https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668
The feature is only functional in LLVM if LLVM is built with CMake flag
LLVM_EXPERIMENTAL_KEY_INSTRUCTIONs. Eventually that flag will be removed.
The InstrProf headers are very expensive. Avoid including them in all of
CodeGen/ by moving the CodeGenPGO member behind a unqiue_ptr.
This reduces clang build time by 0.8%.
Move the initialization of ptrauth-* function attributes near the
initialization of branch protection attributes. The semantics of these
groups of attributes partially overlaps, so handle both groups in
getDefaultFunctionAttributes() and setTargetAttributes() functions to
prevent getting them out of sync. This fixes C++ TLS wrappers.
For i1 vectors, we used an i8 fixed vector as the storage type.
If the known minimum number of elements of the scalable vector type is
less than 8, we were doing the cast through memory. This used a load or
store from a fixed vector alloca. If is less than 8, DataLayout
indicates that the load/store reads/writes vscale bytes even if vscale
is known and vscale*X is less than or equal to 8. This means the load or
store is outside the bounds of the fixed size alloca as far as
DataLayout is concerned leading to undefined behavior.
This patch avoids this by widening the i1 scalable vector type with zero
elements until it is divisible by 8. This allows it be bitcasted to/from
an i8 scalable vector. We then insert or extract the i8 fixed vector
into this type.
Hopefully this enables #130973 to be accepted.
Adds support for MSVC's undocumented `/funcoverride` flag, which marks
functions as being replaceable by the Windows kernel loader. This is
used to allow functions to be upgraded depending on the capabilities of
the current processor (e.g., the kernel can be built with the naive
implementation of a function, but that function can be replaced at boot
with one that uses SIMD instructions if the processor supports them).
For each marked function we need to generate:
* An undefined symbol named `<name>_$fo$`.
* A defined symbol `<name>_$fo_default$` that points to the `.data`
section (anywhere in the data section, it is assumed to be zero sized).
* An `/ALTERNATENAME` linker directive that points from `<name>_$fo$` to
`<name>_$fo_default$`.
This is used by the MSVC linker to generate the appropriate metadata in
the Dynamic Value Relocation Table.
Marked function must never be inlined (otherwise those inline sites
can't be replaced).
Note that I've chosen to implement this in AsmPrinter as there was no
way to create a `GlobalVariable` for `<name>_$fo$` that would result in
a symbol being emitted (as nothing consumes it and it has no
initializer). I tried to have `llvm.used` and `llvm.compiler.used` point
to it, but this didn't help.
Within LLVM I referred to this feature as "loader replaceable" as
"function override" already has a different meaning to C++ developers...
I also took the opportunity to extract the feature symbol generation
code used by both AArch64 and X86 into a common function in AsmPrinter.
It isn't used and is redundant with the result pointer type argument.
A more reasonable API would only have LangAS parameters, or IR parameters,
not both. Not all values have a meaningful value for this. I'm also
not sure why we have this at all, it's not overridden by any targets and
further simplification is possible.
Do not assume it's the alloca address space, we have an explicit
address space to use for the argument already. Also use the original
value's type instead of assuming DefaultAS.
This essentially reverts 1bf1a156d673.
OpenCL's handling of address spaces has always been a mess, but it's
better than it used to be so this hack appears to be unnecessary now.
None of the code here should really depend on the language or language
address space. The ABI address space to use is already explicit in the
ABIArgInfo, so use that instead of guessing it has anything to do with
LangAS::Default or getASTAllocaAddressSpace.
The below usage of LangAS::Default and getASTAllocaAddressSpace are also
suspect, but appears to be a more involved and separate fix.