Commit 44e875ad5b2ce26826dd53f9e7d1a71436c86212 introduced a change that
replaces `ReplaceInstWithInst` with `Instruction::replaceAllUsesWith`,
without subsequent instruction cleanup.
This results in TSan leaving behind useless `load atomic` instructions
after 'replacing' them.
This commit adds cleanup back, consistent with the context.
Seeing how we can't generate any debug intrinsics any more: delete a
variety of codepaths where they're handled. For the most part these are
plain deletions, in others I've tweaked comments to remain coherent, or
added a type to (what was) type-generic-lambdas.
This isn't all the DbgInfoIntrinsic call sites but it's most of the
simple scenarios.
Co-authored-by: Nikita Popov <github@npopov.com>
This PR is based on my last PR #132752 (the first commit of this PR),
but addressing a different issue.
This commit addresses the limitation in `PointerMayBeCaptured` analysis
when dealing with derived pointers (e.g. arr+1) as described in issue
#132739.
The current implementation of `PointerMayBeCaptured` may miss captures
of the underlying `alloca` when analyzing derived pointers, leading to
some FNs in TSan, as follows:
```cpp
void *Thread(void *a) {
((int*)a)[1] = 43;
return 0;
}
int main() {
int Arr[2] = {41, 42};
pthread_t t;
pthread_create(&t, 0, Thread, &Arr[0]);
// Missed instrumentation here due to the FN of PointerMayBeCaptured
Arr[1] = 43;
barrier_wait(&barrier);
pthread_join(t, 0);
}
```
Refer to this [godbolt page](https://godbolt.org/z/n67GrxdcE) to get the
compilation result of TSan.
Even when `PointerMayBeCaptured` working correctly, it should backtrack
to the original `alloca` firstly during analysis, causing redundancy to
the outer's `findAllocaForValue`.
```cpp
const AllocaInst *AI = findAllocaForValue(Addr);
// Instead of Addr, we should check whether its base pointer is captured.
if (AI && !PointerMayBeCaptured(Addr, true)) ...
```
Key changes:
Directly analyze the capture status of the underlying `alloca` instead
of derived pointers to ensure accurate capture detection
```cpp
const AllocaInst *AI = findAllocaForValue(Addr);
// Instead of Addr, we should check whether its base pointer is captured.
if (AI && !PointerMayBeCaptured(AI, true)) ...
```
ThreadSanitizer.cpp and SanitizerBinaryMetadata.cpp previously used
`getUnderlyingObject` to check if pointers originate from stack objects.
However, `getUnderlyingObject()` by default only looks through linear
chains, not selects/phis. In particular, this means that we miss cases
involving pointer induction variables.
For instance,
```llvm
%stkobj = alloca [2 x i32], align 8
; getUnderlyingObject(%derived) = %derived
%derived = getelementptr inbounds i32, ptr %stkobj, i64 1
```
This will result in redundant instrumentation of TSan, resulting in
greater performance costs, especially when there are loops, referring to
this [godbolt page](https://godbolt.org/z/eaT1fPjTW) for details.
```cpp
char loop(int x) {
char buf[10];
char *p = buf;
for (int i = 0; i < x && i < 10; i++) {
// Should not instrument, as its base object is a non-captured stack
// variable.
// However, currectly, it is instrumented due to %p = %phi ...
*p++ = i;
}
// Use buf to prevent it from being eliminated by optimization
return buf[9];
}
```
There are TWO APIs `getUnderlyingObjectAggressive` and
`findAllocaForValue` that can backtrack the pointer via tree traversal,
supporting phis/selects.
This patch replaces `getUnderlyingObject` with `findAllocaForValue`
which:
1. Properly tracks through PHINodes and select operations
2. Directly identifies if a pointer comes from a `AllocaInst`
Performance impact:
- Compilation: Moderate cost increase due to wider value tracing, but...
- Runtime: Significant wins for code with pointer induction variables
derived from stack allocas, especially for loop-heavy code, as
instrumentation can now be safely omitted.
The module currently stores the target triple as a string. This means
that any code that wants to actually use the triple first has to
instantiate a Triple, which is somewhat expensive. The change in #121652
caused a moderate compile-time regression due to this. While it would be
easy enough to work around, I think that architecturally, it makes more
sense to store the parsed Triple in the module, so that it can always be
directly queried.
For this change, I've opted not to add any magic conversions between
std::string and Triple for backwards-compatibilty purses, and instead
write out needed Triple()s or str()s explicitly. This is because I think
a decent number of them should be changed to work on Triple as well, to
avoid unnecessary conversions back and forth.
The only interesting part in this patch is that the default triple is
Triple("") instead of Triple() to preserve existing behavior. The former
defaults to using the ELF object format instead of unknown object
format. We should fix that as well.
The implementation doesn't use it, and is unlikely to use it in
the future.
The places that do set StoreCaptures=false, do so incorrectly and
would be broken if the parameter actually did anything.
As part of the "RemoveDIs" project, BasicBlock::iterator now carries a
debug-info bit that's needed when getFirstNonPHI and similar feed into
instruction insertion positions. Call-sites where that's necessary were
updated a year ago; but to ensure some type safety however, we'd like to
have all calls to getFirstNonPHI use the iterator-returning version.
This patch changes a bunch of call-sites calling getFirstNonPHI to use
getFirstNonPHIIt, which returns an iterator. All these call sites are
where it's obviously safe to fetch the iterator then dereference it. A
follow-up patch will contain less-obviously-safe changes.
We'll eventually deprecate and remove the instruction-pointer
getFirstNonPHI, but not before adding concise documentation of what
considerations are needed (very few).
---------
Co-authored-by: Stephen Tozer <Melamoto@gmail.com>
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
This PR changes the sanitizer passes to be idempotent.
When any sanitizer pass is run after it has already been run before,
double instrumentation is seen in the resulting IR. This happens because
there is no check in the pass, to verify if IR has been instrumented
before.
This PR checks if "nosanitize_*" module flag is already present and if
true, return early without running the pass again.
This is mainly for consistency across code paths, but also makes
sure that all calls use IRInstrumentationBuilder and its special
debuginfo handling.
The two remaining uses don't actually need RAUW, they just have
to erase the original instruction.
atomicrmw xchg also accepts pointer and floating-point values. To handle
those, insert necessary casts to and from integer. This is what we do
for cmpxchg as well.
Fixes https://github.com/llvm/llvm-project/issues/85226.
Certain instrumentations set the !nosanitize metadata for inserted
instructions, which are generally not interested for sanitizers. Skip
tsan instrumentation like we do for asan (D126294)/msan/hwasan.
-fprofile-arcs instrumentation has data race unless
-fprofile-update=atomic is specified. Let's remove the the `__llvm_gcov`
special case from commit 0222adbcd25779a156399bcc16fde9f6d083a809 (2016)
as the racy instructions have the !nosanitize metadata.
(-fprofile-arcs instrumentation does not use `__llvm_gcda` as global variables.)
```
std::atomic<int> c;
void foo() { c++; }
int main() {
std::thread th(foo);
c++;
th.join();
}
```
Tested that `clang++ --coverage -fsanitize=thread a.cc && ./a.out` does
not report spurious tsan errors.
Also remove the default CC1 option -fprofile-update=atomic for
-fsanitize=thread to make options more orthogonal.
Reviewed By: Enna1
Differential Revision: https://reviews.llvm.org/D158385
When building with debug info enabled, some load/store instructions do
not have a DebugLocation attached. When using the default IRBuilder, it
attempts to copy the DebugLocation from the insertion-point instruction.
When there's no DebugLocation, no attempt is made to add one.
Add a fallback DebugLocation with the help of InstrumentationIRBuilder for
memintrinsics. In particular, the compiler may optimize load/store without
debug info into memintrinsics, which then are missing debug info as well.
For the targets that have in their ABI the requirement that arguments and
return values are extended to the full register bitwidth, it is important
that calls when built also take care of this detail.
The OMPIRBuilder, AddressSanitizer, GCOVProfiling, MemorySanitizer and
ThreadSanitizer passes are with this patch hopefully now doing this properly.
Reviewed By: Eli Friedman, Ulrich Weigand, Johannes Doerfert
Differential Revision: https://reviews.llvm.org/D133949
value() has undesired exception checking semantics and calls
__throw_bad_optional_access in libc++. Moreover, the API is unavailable without
_LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see
_LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS).
This fixes clang.
After https://reviews.llvm.org/rG463aa814182a23 tsan replaces llvm
intrinsics with calls to glibc functions. However this approach is
fragile, as slight changes in pipeline can return llvm intrinsics back.
In particular InstCombine can do that.
Msan/Asan already declare own version of these memory
functions for the similar purpose.
KCSAN, or anything that uses something else than compiler-rt, needs to
implement this callbacks.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D133268
After https://reviews.llvm.org/rG463aa814182a23 tsan replaces llvm
intrinsics with calls to glibc functions. However this approach is
fragile, as slight changes in pipeline can return llvm intrinsics back.
In particular InstCombine can do that.
Msan/Asan already declare own version of these memory
functions for the similar purpose.
KCSAN, or anything that uses something else than compiler-rt, needs to
implement this callbacks.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D133268
Factor our InstrumentationIRBuilder and share it between ThreadSanitizer
and SanitizerCoverage. Simplify its usage at the same time (use function
of passed Instruction or BasicBlock).
This class may be used in other instrumentation passes in future.
NFCI.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D125038
When building with debug info enabled, some load/store instructions do
not have a DebugLocation attached. When using the default IRBuilder, it
attempts to copy the DebugLocation from the insertion-point instruction.
When there's no DebugLocation, no attempt is made to add one.
This is problematic for inserted calls, where the enclosing function has
debug info but the call ends up without a DebugLocation in e.g. LTO
builds that verify that both the enclosing function and calls to
inlinable functions have debug info attached.
This issue was noticed in Linux kernel KCSAN builds with LTO and debug
info enabled:
| ...
| inlinable function call in a function with debug info must have a !dbg location
| call void @__tsan_read8(i8* %432)
| ...
To fix, ensure that all calls to the runtime have a DebugLocation
attached, where the possibility exists that the insertion-point might
not have any DebugLocation attached to it.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D124937
Running iwyu-diff on LLVM codebase since fa5a4e1b95c8f37796 detected a few
regressions, fixing them.
Differential Revision: https://reviews.llvm.org/D124847
Using the legacy PM for the optimization pipeline was deprecated in 13.0.0.
Following recent changes to remove non-core features of the legacy
PM/optimization pipeline, remove ThreadSanitizerLegacyPass.
Reviewed By: #sanitizers, vitalybuka
Differential Revision: https://reviews.llvm.org/D124209
An analysis may just be interested in checking if an instruction is
atomic but system scoped or single-thread scoped, like ThreadSanitizer's
isAtomic(). Unfortunately Instruction::isAtomic() can only answer the
"atomic" part of the question, but to also check scope becomes rather
verbose.
To simplify and reduce redundancy, introduce a common helper
getAtomicSyncScopeID() which returns the scope of an atomic operation.
Start using it in ThreadSanitizer.
NFCI.
Reviewed By: dvyukov
Differential Revision: https://reviews.llvm.org/D121910
Tsan pass does 2 optimizations based on presence of calls:
1. Don't emit function entry/exit callbacks if there are no calls
and no memory accesses.
2. Combine read/write of the same variable if there are no
intervening calls.
However, all debug info is represented as CallInst as well
and thus effectively disables these optimizations.
Don't consider debug info calls as calls.
Reviewed By: glider, melver
Differential Revision: https://reviews.llvm.org/D114079
Split ThreadSanitizerPass into ThreadSanitizerPass (as a function
pass) and ModuleThreadSanitizerPass (as a module pass).
Main reason is to make sure that we have a unique mapping from
ClassName to PassName in the new passmanager framework, making it
possible to correctly identify the passes when dealing with options
such as -print-after and -print-pipeline-passes.
This is a follow-up to D105006 and D105007.