This extends https://github.com/llvm/llvm-project/pull/138577 to more UBSan checks, by changing SanitizerDebugLocation (formerly SanitizerScope) to add annotations if enabled for the specified ordinals.
Annotations will use the ordinal name if there is exactly one ordinal specified in the SanitizerDebugLocation; otherwise, it will use the handler name.
Updates the tests from https://github.com/llvm/llvm-project/pull/141814.
---------
Co-authored-by: Vitaly Buka <vitalybuka@google.com>
This refactors existing code into a 'SanitizerInfoFromCFICheckKind'
helper function. This will be useful in future work to annotate CFI
checks with debug info
(https://github.com/llvm/llvm-project/pull/139809).
It isn't used and is redundant with the result pointer type argument.
A more reasonable API would only have LangAS parameters, or IR parameters,
not both. Not all values have a meaningful value for this. I'm also
not sure why we have this at all, it's not overridden by any targets and
further simplification is possible.
-fno-sanitize-merge (introduced in
https://github.com/llvm/llvm-project/pull/120464) nearly works for CFI:
code that calls EmitCheck will already check the merge options. This
patch fixes one EmitTrapCheck call, which did not check the merge
options, and for two other EmitTrapChecks, adds two TODOs that explain
why it is difficult to fix them.
The qualifier allows programmer to directly control how pointers are
signed when they are stored in a particular variable.
The qualifier takes three arguments: the signing key, a flag specifying
whether address discrimination should be used, and a non-negative
integer that is used for additional discrimination.
```
typedef void (*my_callback)(const void*);
my_callback __ptrauth(ptrauth_key_process_dependent_code, 1, 0xe27a) callback;
```
Co-Authored-By: John McCall rjmccall@apple.com
Finding operator delete[] is still problematic, without it the extension
is a security hazard, so reverting until the problem with operator
delete[] is figured out.
This reverts the following PRs:
Reland [MS][clang] Add support for vector deleting destructors (llvm#133451)
[MS][clang] Make sure vector deleting dtor calls correct operator delete (llvm#133950)
[MS][clang] Fix crash on deletion of array of pointers (llvm#134088)
[clang] Do not diagnose unused deleted operator delete[] (llvm#134357)
[MS][clang] Error about ambiguous operator delete[] only when required (llvm#135041)
During additional testing I spotted that vector deleting dtor calls
operator delete, not operator delete[] when performing array deletion.
This patch fixes that.
Whereas it is UB in terms of the standard to delete an array of objects
via pointer whose static type doesn't match its dynamic type, MSVC
supports an extension allowing to do it.
Aside from array deletion not working correctly in the mentioned case,
currently not having this extension implemented causes clang to generate
code that is not compatible with the code generated by MSVC, because
clang always puts scalar deleting destructor to the vftable. This PR
aims to resolve these problems.
It was reverted due to link time errors in chromium with sanitizer
coverage enabled,
which is fixed by https://github.com/llvm/llvm-project/pull/131929 .
The second commit of this PR also contains a fix for a runtime failure
in chromium reported
in
https://github.com/llvm/llvm-project/pull/126240#issuecomment-2730216384
.
Fixes https://github.com/llvm/llvm-project/issues/19772
This caused link errors when building with sancov. See comment on the PR.
> Whereas it is UB in terms of the standard to delete an array of objects
> via pointer whose static type doesn't match its dynamic type, MSVC
> supports an extension allowing to do it.
> Aside from array deletion not working correctly in the mentioned case,
> currently not having this extension implemented causes clang to generate
> code that is not compatible with the code generated by MSVC, because
> clang always puts scalar deleting destructor to the vftable. This PR
> aims to resolve these problems.
>
> Fixes https://github.com/llvm/llvm-project/issues/19772
This reverts commit d6942d54f677000cf713d2b0eba57b641452beb4.
Whereas it is UB in terms of the standard to delete an array of objects
via pointer whose static type doesn't match its dynamic type, MSVC
supports an extension allowing to do it.
Aside from array deletion not working correctly in the mentioned case,
currently not having this extension implemented causes clang to generate
code that is not compatible with the code generated by MSVC, because
clang always puts scalar deleting destructor to the vftable. This PR
aims to resolve these problems.
Fixes https://github.com/llvm/llvm-project/issues/19772
This intrinsic is used when whole program vtables is used in conjunction
with either CFI or virtual function elimination. The
`llvm.type.checked.load` is unconditionally used, but we need to use the
relative intrinsic for WPD and CFI to work correctly.
The `Checked` parameter of `CodeGenFunction::EmitCheck` is of type
`ArrayRef<std::pair<llvm::Value *, SanitizerMask>>`, which is overly
generalized: SanitizerMask can denote that zero or more sanitizers are
enabled, but `EmitCheck` requires that exactly one sanitizer is
specified in the SanitizerMask (e.g.,
`SanitizeTrap.has(Checked[i].second)` enforces that).
This patch replaces SanitizerMask with SanitizerOrdinal in the `Checked`
parameter of `EmitCheck` and code that transitively relies on it. This
should not affect the behavior of UBSan, but it has the advantages that:
- the code is clearer: it avoids ambiguity in EmitCheck about what to do
if multiple bits are set
- specifying the wrong number of sanitizers in `Checked[i].second` will
be detected as a compile-time error, rather than a runtime assertion
failure
Suggested by Vitaly in https://github.com/llvm/llvm-project/pull/122392
as an alternative to adding an explicit runtime assertion that the
SanitizerMask contains exactly one sanitizer.
This reverts commit 81fc3add1e627c23b7270fe2739cdacc09063e54.
This breaks some LLDB tests, e.g.
SymbolFile/DWARF/x86/no_unique_address-with-bitfields.cpp:
lldb: ../llvm-project/clang/lib/AST/Decl.cpp:4604: unsigned int clang::FieldDecl::getBitWidthValue() const: Assertion `isa<ConstantExpr>(getBitWidth())' failed.
Save the bitwidth value as a `ConstantExpr` with the value set. Remove
the `ASTContext` parameter from `getBitWidthValue()`, so the latter
simply returns the value from the `ConstantExpr` instead of
constant-evaluating the bitwidth expression every time it is called.
This patch is the frontend implementation of the coroutine elide
improvement project detailed in this discourse post:
https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044
This patch proposes a C++ struct/class attribute
`[[clang::coro_await_elidable]]`. This notion of await elidable task
gives developers and library authors a certainty that coroutine heap
elision happens in a predictable way.
Originally, after we lower a coroutine to LLVM IR, CoroElide is
responsible for analysis of whether an elision can happen. Take this as
an example:
```
Task foo();
Task bar() {
co_await foo();
}
```
For CoroElide to happen, the ramp function of `foo` must be inlined into
`bar`. This inlining happens after `foo` has been split but `bar` is
usually still a presplit coroutine. If `foo` is indeed a coroutine, the
inlined `coro.id` intrinsics of `foo` is visible within `bar`. CoroElide
then runs an analysis to figure out whether the SSA value of
`coro.begin()` of `foo` gets destroyed before `bar` terminates.
`Task` types are rarely simple enough for the destroy logic of the task
to reference the SSA value from `coro.begin()` directly. Hence, the pass
is very ineffective for even the most trivial C++ Task types. Improving
CoroElide by implementing more powerful analyses is possible, however it
doesn't give us the predictability when we expect elision to happen.
The approach we want to take with this language extension generally
originates from the philosophy that library implementations of `Task`
types has the control over the structured concurrency guarantees we
demand for elision to happen. That is, the lifetime for the callee's
frame is shorter to that of the caller.
The ``[[clang::coro_await_elidable]]`` is a class attribute which can be
applied to a coroutine return type.
When a coroutine function that returns such a type calls another
coroutine function, the compiler performs heap allocation elision when
the following conditions are all met:
- callee coroutine function returns a type that is annotated with
``[[clang::coro_await_elidable]]``.
- In caller coroutine, the return value of the callee is a prvalue that
is immediately `co_await`ed.
From the C++ perspective, it makes sense because we can ensure the
lifetime of elided callee cannot exceed that of the caller if we can
guarantee that the caller coroutine is never destroyed earlier than the
callee coroutine. This is not generally true for any C++ programs.
However, the library that implements `Task` types and executors may
provide this guarantee to the compiler, providing the user with
certainty that HALO will work on their programs.
After this patch, when compiling coroutines that return a type with such
attribute, the frontend checks that the type of the operand of
`co_await` expressions (not `operator co_await`). If it's also
attributed with `[[clang::coro_await_elidable]]`, the FE emits metadata
on the call or invoke instruction as a hint for a later middle end pass
to elide the elision.
The original patch version is
https://github.com/llvm/llvm-project/pull/94693 and as suggested, the
patch is split into frontend and middle end solutions into stacked PRs.
The middle end CoroSplit patch can be found at
https://github.com/llvm/llvm-project/pull/99283
The middle end transformation that performs the elide can be found at
https://github.com/llvm/llvm-project/pull/99285
In incremental compilation clang works with multiple `llvm::Module`s.
Our current approach is to create a CodeGenModule entity for every new
module request (via StartModule). However, some of the state such as the
mangle context needs to be preserved to keep the original semantics in
the ever-growing TU.
Fixes: llvm/llvm-project#95581.
cc: @jeaye
This is a follow-up from the conversation starting at
https://github.com/llvm/llvm-project/pull/93809#issuecomment-2173729801
The root problem that motivated the change are external AST sources that
compute `ASTRecordLayout`s themselves instead of letting Clang compute
them from the AST. One such example is LLDB using DWARF to get the
definitive offsets and sizes of C++ structures. Such layouts should be
considered correct (modulo buggy DWARF), but various assertions and
lowering logic around the `CGRecordLayoutBuilder` relies on the AST
having `[[no_unique_address]]` attached to them. This is a
layout-altering attribute which is not encoded in DWARF. This causes us
LLDB to trip over the various LLVM<->Clang layout consistency checks.
There has been precedent for avoiding such layout-altering attributes
from affecting lowering with externally-provided layouts (e.g., packed
structs).
This patch proposes to replace the `isZeroSize` checks in
`CGRecordLayoutBuilder` (which roughly means "empty field with
[[no_unique_address]]") with checks for
`CodeGen::isEmptyField`/`CodeGen::isEmptyRecord`.
**Details**
The main strategy here was to change the `isZeroSize` check in
`CGRecordLowering::accumulateFields` and
`CGRecordLowering::accumulateBases` to use the `isEmptyXXX` APIs
instead, preventing empty fields from being added to the `Members` and
`Bases` structures. The rest of the changes fall out from here, to
prevent lookups into these structures (for field numbers or base
indices) from failing.
Added `isEmptyRecordForLayout` and `isEmptyFieldForLayout` (open to
better naming suggestions). The main difference to the existing
`isEmptyRecord`/`isEmptyField` APIs, is that the `isEmptyXXXForLayout`
counterparts don't have special treatment for `unnamed bitfields`/arrays
and also treat fields of empty types as if they had
`[[no_unique_address]]` (i.e., just like the `AsIfNoUniqueAddr` in
`isEmptyField` does).
Virtual function pointer entries in v-tables are signed with address
discrimination in addition to declaration-based discrimination, where an
integer discriminator the string hash (see
`ptrauth_string_discriminator`) of the mangled name of the overridden
method. This notably provides diversity based on the full signature of
the overridden method, including the method name and parameter types.
This patch introduces ItaniumVTableContext logic to find the original
declaration of the overridden method.
On AArch64, these pointers are signed using the `IA` key (the
process-independent code key.)
V-table pointers can be signed with either no discrimination, or a
similar scheme using address and decl-based discrimination. In this
case, the integer discriminator is the string hash of the mangled
v-table identifier of the class that originally introduced the vtable
pointer.
On AArch64, these pointers are signed using the `DA` key (the
process-independent data key.)
Not using discrimination allows attackers to simply copy valid v-table
pointers from one object to another. However, using a uniform
discriminator of 0 does have positive performance and code-size
implications on AArch64, and diversity for the most important v-table
access pattern (virtual dispatch) is already better assured by the
signing schemas used on the virtual functions. It is also known that
some code in practice copies objects containing v-tables with `memcpy`,
and while this is not permitted formally, it is something that may be
invasive to eliminate.
This is controlled by:
```
-fptrauth-vtable-pointer-type-discrimination
-fptrauth-vtable-pointer-address-discrimination
```
In addition, this provides fine-grained controls in the
ptrauth_vtable_pointer attribute, which allows overriding the default
ptrauth schema for vtable pointers on a given class hierarchy, e.g.:
```
[[clang::ptrauth_vtable_pointer(no_authentication, no_address_discrimination,
no_extra_discrimination)]]
[[clang::ptrauth_vtable_pointer(default_key, default_address_discrimination,
custom_discrimination, 0xf00d)]]
```
The override is then mangled as a parametrized vendor extension:
```
"__vtptrauth" I
<key>
<addressDiscriminated>
<extraDiscriminator>
E
```
To support this attribute, this patch adds a small extension to the
attribute-emitter tablegen backend.
Note that there are known areas where signing is either missing
altogether or can be strengthened. Some will be addressed in later
changes (e.g., member function pointers, some RTTI).
`dynamic_cast` in particular is handled by emitting an artificial
v-table pointer load (in a way that always authenticates it) before the
runtime call itself, as the runtime doesn't have enough information
today to properly authenticate it. Instead, the runtime is currently
expected to strip the v-table pointer.
---------
Co-authored-by: John McCall <rjmccall@apple.com>
Co-authored-by: Ahmed Bougacha <ahmed@bougacha.org>
This is in effect a revert of f139ae3d93797, as we have since gained a
more sophisticated way of doing extra IRGen with the addition of
RawAddress in #86923.
Hit this when trying upgrade an old project of mine. I couldn't find a
corresponding existing issue for this when spelunking the open issues
here on github.
Thankfully I can work-around it today with the `[[clang::no_destroy]]`
attribute for my use case. However it should still be properly fixed.
### Issue and History ###
https://godbolt.org/z/EYnhce8MK for reference.
All subsequent text below refers to the example in the godbolt above.
Anonymous unions never have their destructor invoked automatically.
Therefore we can skip vtable initialization of the destructor of a
dynamic class if that destructor effectively does no work.
This worked previously as the following check would be hit and return
true for the trivial anonymous union,
https://github.com/llvm/llvm-project/blob/release/18.x/clang/lib/CodeGen/CGClass.cpp#L1348,
resulting in the code skipping vtable initialization.
This was broken here
982bbf404e
in relation to comments made on this review here
https://reviews.llvm.org/D10508.
### Fixes ###
The check the code is doing is correct however the return value is
inverted. We want to return true here since a field with anonymous union
never has its destructor invoked and thus effectively has a trivial
destructor body from the perspective of requiring vtable init in the
parent dynamic class.
Also added some extra missing unit tests to test for this use case and a
couple others.
To authenticate pointers, CodeGen needs access to the key and
discriminators that were used to sign the pointer. That information is
sometimes known from the context, but not always, which is why `Address`
needs to hold that information.
This patch adds methods and data members to `Address`, which will be
needed in subsequent patches to authenticate signed pointers, and uses
the newly added methods throughout CodeGen. Although this patch isn't
strictly NFC as it causes CodeGen to use different code paths in some
cases (e.g., `mergeAddressesInConditionalExpr`), it doesn't cause any
changes in functionality as it doesn't add any information needed for
authentication.
In addition to the changes mentioned above, this patch introduces class
`RawAddress`, which contains a pointer that we know is unsigned, and
adds several new functions for creating `Address` and `LValue` objects.
This reapplies d9a685a9dd589486e882b722e513ee7b8c84870c, which was
reverted because it broke ubsan bots. There seems to be a bug in
coroutine code-gen, which is causing EmitTypeCheck to use the wrong
alignment. For now, pass alignment zero to EmitTypeCheck so that it can
compute the correct alignment based on the passed type (see function
EmitCXXMemberOrOperatorMemberCallExpr).
To authenticate pointers, CodeGen needs access to the key and
discriminators that were used to sign the pointer. That information is
sometimes known from the context, but not always, which is why `Address`
needs to hold that information.
This patch adds methods and data members to `Address`, which will be
needed in subsequent patches to authenticate signed pointers, and uses
the newly added methods throughout CodeGen. Although this patch isn't
strictly NFC as it causes CodeGen to use different code paths in some
cases (e.g., `mergeAddressesInConditionalExpr`), it doesn't cause any
changes in functionality as it doesn't add any information needed for
authentication.
In addition to the changes mentioned above, this patch introduces class
`RawAddress`, which contains a pointer that we know is unsigned, and
adds several new functions for creating `Address` and `LValue` objects.
This reapplies 8bd1f9116aab879183f34707e6d21c7051d083b6. The commit
broke msan bots because LValue::IsKnownNonNull was uninitialized.
To authenticate pointers, CodeGen needs access to the key and
discriminators that were used to sign the pointer. That information is
sometimes known from the context, but not always, which is why `Address`
needs to hold that information.
This patch adds methods and data members to `Address`, which will be
needed in subsequent patches to authenticate signed pointers, and uses
the newly added methods throughout CodeGen. Although this patch isn't
strictly NFC as it causes CodeGen to use different code paths in some
cases (e.g., `mergeAddressesInConditionalExpr`), it doesn't cause any
changes in functionality as it doesn't add any information needed for
authentication.
In addition to the changes mentioned above, this patch introduces class
`RawAddress`, which contains a pointer that we know is unsigned, and
adds several new functions for creating `Address` and `LValue` objects.
The goal of this change is to clean up some of the code surrounding
HLSL using CXXThisExpr as a non-pointer l-value. This change cleans up
a bunch of assumptions and inconsistencies around how the type of
`this` is handled through the AST and code generation.
This change is be mostly NFC for HLSL, and completely NFC for other
language modes.
This change introduces a new member to query for the this object's type
and seeks to clarify the normal usages of the this type.
With the introudction of HLSL to clang, CXXThisExpr may now be an
l-value and behave like a reference type rather than C++'s normal
method of it being an r-value of pointer type.
With this change there are now three ways in which a caller might need
to query the type of `this`:
* The type of the `CXXThisExpr`
* The type of the object `this` referrs to
* The type of the implicit (or explicit) `this` argument
This change codifies those three ways you may need to query
respectively as:
* CXXMethodDecl::getThisType()
* CXXMethodDecl::getThisObjectType()
* CXXMethodDecl::getThisArgType()
This change then revisits all uses of `getThisType()`, and in cases
where the only use was to resolve the pointee type, it replaces the
call with `getThisObjectType()`. In other cases it evaluates whether
the desired returned type is the type of the `this` expr, or the type
of the `this` function argument. The `this` expr type is used for
creating additional expr AST nodes and for member lookup, while the
argument type is used mostly for code generation.
Additionally some cases that used `getThisType` in simple queries could
be substituted for `getThisObjectType`. Since `getThisType` is
implemented in terms of `getThisObjectType` calling the later should be
more efficient if the former isn't needed.
Reviewed By: aaron.ballman, bogner
Differential Revision: https://reviews.llvm.org/D159247
Partial progress towards replacing uses of CreateElementBitCast, as it
no longer does what its name suggests.
Reviewed By: barannikov88
Differential Revision: https://reviews.llvm.org/D154229
Partial progress towards replacing in-tree uses of `Type::getPointerTo()`.
This needs to be done before deprecating the API.
Reviewed By: nikic, barannikov88
Differential Revision: https://reviews.llvm.org/D152321