Create a new `IRBuilderBase::CreateAllocationSize` method to compute the
runtime size of an alloca as a Value*. This handles both static and
dynamic allocas by computing `ArraySize * ElementSize`, and using
CreateTypeSize to properly handle scalable vectors.
This de-duplicates code across multiple instrumentation and analysis
passes and increases consistency.
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
For some backends, e.g., BPF, it is desirable to only sanitize memory
belonging to specific address spaces. More specifically, it is sometimes
desirable to only apply address sanitization for arena memory belonging
to address space 1. However, AddressSanitizer currently does not support
selectively sanitizing address spaces. Add a new option to select which
address spaces to apply AddressSanitizer to.
No functional change for existing targets (namely AMD GPU) that hardcode
which address spaces to sanitize
The AddressSanitizer transform does not have a default offset registered
for the shadow map. Set the default shadow map offset for BPF be
dynamically set by the KASAN implementation.
The AddressSanitizer transform currently defaults to placing the shadow
map in address space 0, but it is desirable for some targets (namely
BPF) to select a different address space for the map. Add a compilation
option for specifying the address space of the target.
The `masked.load`, `masked.store`, `masked.gather` and `masked.scatter`
intrinsics currently accept a separate alignment immarg. Replace this
with an `align` attribute on the pointer / vector of pointers argument.
This is the standard representation for alignment information on
intrinsics, and is already used by all other memory intrinsics. This
means the signatures now match llvm.expandload, llvm.vp.load, etc.
(Things like llvm.memcpy used to have a separate alignment argument as
well, but were already migrated a long time ago.)
It's worth noting that the masked.gather and masked.scatter intrinsics
previously accepted a zero alignment to indicate the ABI type alignment
of the element type. This special case is gone now: If the align
attribute is omitted, the implied alignment is 1, as usual. If ABI
alignment is desired, it needs to be explicitly emitted (which the
IRBuilder API already requires anyway).
**Mitigation for:** https://github.com/google/sanitizers/issues/749
**Disclosure:** I'm not an ASan compiler expert yet (I'm trying to
learn!), I primarily work in the runtime. Some of this PR was developed
with the help of AI tools (primarily as a "fuzzy `grep` engine"), but
I've manually refined and tested the output, and can speak for every
line. In general, I used it only to orient myself and for
"rubberducking".
**Context:**
The msvc ASan team (👋 ) has received an internal request to improve
clang's exception handling under ASan for Windows. Namely, we're
interested in **mitigating** this bug:
https://github.com/google/sanitizers/issues/749
To summarize, today, clang + ASan produces a false-positive error for
this program:
```C++
#include <cstdio>
#include <exception>
int main()
{
try {
throw std::exception("test");
}catch (const std::exception& ex){
puts(ex.what());
}
return 0;
}
```
The error reads as such:
```
C:\Users\dajusto\source\repros\upstream>type main.cpp
#include <cstdio>
#include <exception>
int main()
{
try {
throw std::exception("test");
}catch (const std::exception& ex){
puts(ex.what());
}
return 0;
}
C:\Users\dajusto\source\repros\upstream>"C:\Users\dajusto\source\repos\llvm-project\build.runtimes\bin\clang.exe" -fsanitize=address -g -O0 main.cpp
C:\Users\dajusto\source\repros\upstream>a.exe
=================================================================
==19112==ERROR: AddressSanitizer: access-violation on unknown address 0x000000000000 (pc 0x7ff72c7c11d9 bp 0x0080000ff960 sp 0x0080000fcf50 T0)
==19112==The signal is caused by a READ memory access.
==19112==Hint: address points to the zero page.
#0 0x7ff72c7c11d8 in main C:\Users\dajusto\source\repros\upstream\main.cpp:8
#1 0x7ff72c7d479f in _CallSettingFrame C:\repos\msvc\src\vctools\crt\vcruntime\src\eh\amd64\handlers.asm:49
#2 0x7ff72c7c8944 in __FrameHandler3::CxxCallCatchBlock(struct _EXCEPTION_RECORD *) C:\repos\msvc\src\vctools\crt\vcruntime\src\eh\frame.cpp:1567
#3 0x7ffb4a90e3e5 (C:\WINDOWS\SYSTEM32\ntdll.dll+0x18012e3e5)
#4 0x7ff72c7c1128 in main C:\Users\dajusto\source\repros\upstream\main.cpp:6
#5 0x7ff72c7c33db in invoke_main C:\repos\msvc\src\vctools\crt\vcstartup\src\startup\exe_common.inl:78
#6 0x7ff72c7c33db in __scrt_common_main_seh C:\repos\msvc\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288
#7 0x7ffb49b05c06 (C:\WINDOWS\System32\KERNEL32.DLL+0x180035c06)
#8 0x7ffb4a8455ef (C:\WINDOWS\SYSTEM32\ntdll.dll+0x1800655ef)
==19112==Register values:
rax = 0 rbx = 80000ff8e0 rcx = 27d76d00000 rdx = 80000ff8e0
rdi = 80000fdd50 rsi = 80000ff6a0 rbp = 80000ff960 rsp = 80000fcf50
r8 = 100 r9 = 19930520 r10 = 8000503a90 r11 = 80000fd540
r12 = 80000fd020 r13 = 0 r14 = 80000fdeb8 r15 = 0
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: access-violation C:\Users\dajusto\source\repros\upstream\main.cpp:8 in main
==19112==ABORTING
```
The root of the issue _appears to be_ that ASan's instrumentation is
incompatible with Window's assumptions for instantiating `catch`-block's
parameters (`ex` in the snippet above).
The nitty gritty details are lost on me, but I understand that to make
this work without loss of ASan coverage, a "serious" refactoring is
needed. In the meantime, users risk false positive errors when pairing
ASan + catch-block parameters on Windows.
**To mitigate this** I think we should avoid instrumenting catch-block
parameters on Windows. It appears to me this is as "simple" as marking
catch block parameters as "uninteresting" in
`AddressSanitizer::isInterestingAlloca`. My manual tests seem to confirm
this.
I believe this is strictly better than today's status quo, where the
runtime generates false positives. Although we're now explicitly
choosing to instrument less, the benefit is that now more programs can
run with ASan without _funky_ macros that disable ASan on exception
blocks.
**This PR:** implements the mitigation above, and creates a simple new
test for it.
_Thanks!_
---------
Co-authored-by: Antonio Frighetto <me@antoniofrighetto.com>
This patch is based on https://github.com/llvm/llvm-project/pull/159713
This patch extends AddressSanitizer to support indexed/segment
instructions in RVV. It enables proper instrumentation for these memory
operations.
A new member, `MaybeOffset`, is added to `InterestingMemoryOperand` to
describe the offset between the base pointer and the actual memory
reference address.
Co-authored-by: Yeting Kuo <yeting.kuo@sifive.com>
[Previously reverted due to failures on asan-rvv-intrinsics.ll, the test
case is riscv only and it is triggered by other target]
Reland [#157863](https://github.com/llvm/llvm-project/pull/157863), and
add `; REQUIRES: riscv-registered-target` in test case to skip the
configuration that doesn't register riscv target.
Previously asan considers target intrinsics as black boxes, so asan
could not instrument accurate check. This patch make
SmallVector<InterestingMemoryOperand> a member of MemIntrinsicInfo so
that TTI can make targets describe their intrinsic informations to asan.
Note,
1. This patch move InterestingMemoryOperand from Transforms to Analysis.
2. Extend MemIntrinsicInfo by adding a
SmallVector<InterestingMemoryOperand> member.
3. This patch does not support RVV indexed/segment load/store.
Previously asan considers target intrinsics as black boxes, so asan
could not instrument accurate check. This patch make
SmallVector<InterestingMemoryOperand> a member of MemIntrinsicInfo so
that TTI can make targets describe their intrinsic informations to asan.
Note,
1. This patch move InterestingMemoryOperand from Transforms to Analysis.
2. Extend MemIntrinsicInfo by adding a
SmallVector<InterestingMemoryOperand> member.
3. This patch does not support RVV indexed/segment load/store.
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.
This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
This slightly relaxes the invariant established in #149310, by also
allowing the lifetime argument to be poison. This is to support the
typical pattern of RAUWing with poison when removing an instruction.
It's worth noting that this does not require any conservative
assumptions, lifetimes with poison arguments can simply be skipped.
Fixes https://github.com/llvm/llvm-project/issues/151119.
After #149310 the pointer argument of lifetime.start/lifetime.end is
guaranteed to be an alloca, so we don't need to go through
findAllocaForValue() anymore, and don't have to have special handling
for the case where it fails.
lifetime.start and lifetime.end are primarily intended for use on
allocas, to enable stack coloring and other liveness optimizations. This
is necessary because all (static) allocas are hoisted into the entry
block, so lifetime markers are the only way to convey the actual
lifetimes.
However, lifetime.start and lifetime.end are currently *allowed* to be
used on non-alloca pointers. We don't actually do this in practice, but
just the mere fact that this is possible breaks the core purpose of the
lifetime markers, which is stack coloring of allocas. Stack coloring can
only work correctly if all lifetime markers for an alloca are
analyzable.
* If a lifetime marker may operate on multiple allocas via a select/phi,
we don't know which lifetime actually starts/ends and handle it
incorrectly (https://github.com/llvm/llvm-project/issues/104776).
* Stack coloring operates on the assumption that all lifetime markers
are visible, and not, for example, hidden behind a function call or
escaped pointer. It's not possible to change this, as part of the
purpose of lifetime markers is that they work even in the presence of
escaped pointers, where simple use analysis is insufficient.
I don't think there is any way to have coherent semantics for lifetime
markers on allocas, while also permitting them on arbitrary pointer
values.
This PR restricts lifetimes to operate on allocas only. As a followup, I
will also drop the size argument, which is superfluous if we always
operate on an alloca. (This change also renders various code handling
lifetime markers on non-alloca dead. I plan to clean up that kind of
code after dropping the size argument as well.)
In practice, I've only found a few places that currently produce
lifetimes on non-allocas:
* CoroEarly replaces the promise alloca with the result of an intrinsic,
which will later be replaced back with an alloca. I think this is the
only place where there is some legitimate loss of functionality, but I
don't think this is particularly important (I don't think we'd expect
the promise in a coroutine to admit useful lifetime optimization.)
* SafeStack moves unsafe allocas onto a separate frame. We can safely
drop lifetimes here, as SafeStack performs its own stack coloring.
* Similar for AddressSanitizer, it also moves allocas into separate
memory.
* LSR sometimes replaces the lifetime argument with a GEP chain of the
alloca (where the offsets ultimately cancel out). This is just
unnecessary. (Fixed separately in
https://github.com/llvm/llvm-project/pull/149492.)
* InferAddrSpaces sometimes makes lifetimes operate on an addrspacecast
of an alloca. I don't think this is necessary.
With the advent of intrinsic-less debug-info, we no longer need to
scatter calls to getPrevNonDebugInstruction around the codebase. Remove
most of them -- there are one or two that have the "SkipPseudoOp" flag
turned on, however they don't seem to be in positions where skipping
anything would be reasonable.
I'm working on porting ASan to Wasm/WASI targets, and this is the first
part of the change sets. I'll post runtime changes separately.
This change makes `-fsanitize=address` available for WASI target by
replicating what we do for Emscripten because they share the same memory
model.
…ncorrect name
Clang needs variables to be represented with unique names. This means
that if a variable shadows another, its given a different name
internally to ensure it has a unique name. If ASan tries to use this
name when printing an error, it will print the modified unique name,
rather than the variable's source code name
Fixes#47326
… unnecessary FunctionSanitizer construction (NFC)
This patch moves several early-exit checks (e.g., empty function, etc.)
out of `AddressSanitizer::instrumentFunction` and into the caller. This
change avoids unnecessary construction of FunctionSanitizer when
instrumentation is not needed.
If left as-is, subsequent optimizations might utilize the possible
memory effects and optimize-out the instrumentation. Think of the
following case:
```
store i8 4, ptr %shadow
call void @llvm.lifetime.start.p0(i64 4, ptr %local)
%28 = call void @foo(ptr %local)
store i8 -8, ptr %shadow
call void @llvm.lifetime.end.p0(i64 4, ptr %local)
```
where `foo` is an external function with `memory(argmem: write)`. A pass
such as DeadStoreElimination is allowed to remove the initial store,
which might fail sanitizer checks within `foo`.
My first attempt was to add a `memory(readwrite)` at the call-site
level, but unfortunately the current implementation of
`getMemoryEffects` doesn't exactly give it "precedence" as specified,
but rather restricts the access specified by the call-site and not the
other way around as well.
The module currently stores the target triple as a string. This means
that any code that wants to actually use the triple first has to
instantiate a Triple, which is somewhat expensive. The change in #121652
caused a moderate compile-time regression due to this. While it would be
easy enough to work around, I think that architecturally, it makes more
sense to store the parsed Triple in the module, so that it can always be
directly queried.
For this change, I've opted not to add any magic conversions between
std::string and Triple for backwards-compatibilty purses, and instead
write out needed Triple()s or str()s explicitly. This is because I think
a decent number of them should be changed to work on Triple as well, to
avoid unnecessary conversions back and forth.
The only interesting part in this patch is that the default triple is
Triple("") instead of Triple() to preserve existing behavior. The former
defaults to using the ELF object format instead of unknown object
format. We should fix that as well.
To finalise the "RemoveDIs" work removing debug intrinsics, we're
updating call sites that insert instructions to use iterators instead.
This set of changes are those where it's not immediately obvious that
just calling getIterator to fetch an iterator is correct, and one or two
places where more than one line needs to change.
Overall the same rule holds though: iterators generated for the start of
a block such as getFirstNonPHIIt need to be passed into insert/move
methods without being unwrapped/rewrapped, everything else can use
getIterator.
As part of the "RemoveDIs" work to eliminate debug intrinsics, we're
replacing methods that use Instruction*'s as positions with iterators.
This patch changes some more complex call-sites, those crossing file
boundaries and where I've had to perform some minor rewrites.
As part of the "RemoveDIs" project, BasicBlock::iterator now carries a
debug-info bit that's needed when getFirstNonPHI and similar feed into
instruction insertion positions. Call-sites where that's necessary were
updated a year ago; but to ensure some type safety however, we'd like to
have all calls to getFirstNonPHI use the iterator-returning version.
This patch changes a bunch of call-sites calling getFirstNonPHI to use
getFirstNonPHIIt, which returns an iterator. All these call sites are
where it's obviously safe to fetch the iterator then dereference it. A
follow-up patch will contain less-obviously-safe changes.
We'll eventually deprecate and remove the instruction-pointer
getFirstNonPHI, but not before adding concise documentation of what
considerations are needed (very few).
---------
Co-authored-by: Stephen Tozer <Melamoto@gmail.com>
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
This adds support for:
* `muslabin32` (MIPS N32)
* `muslabi64` (MIPS N64)
* `muslf32` (LoongArch ILP32F/LP64F)
* `muslsf` (LoongArch ILP32S/LP64S)
As we start adding glibc/musl cross-compilation support for these
targets in Zig, it would make our life easier if LLVM recognized these
triples. I'm hoping this'll be uncontroversial since the same has
already been done for `musleabi`, `musleabihf`, and `muslx32`.
I intentionally left out a musl equivalent of `gnuf64` (LoongArch
ILP32D/LP64D); my understanding is that Loongson ultimately settled on
simply `gnu` for this much more common case, so there doesn't *seem* to
be a particularly compelling reason to add a `muslf64` that's basically
deprecated on arrival.
Note: I don't have commit access.
It was found that ASAN logic optimizes away out-of-bound access
instrumentation for over-aligned arrays. See #108287 for complete code
examples.
Fix it by not considering alignment during object size calculation,
since out-of-bounds access for over-aligned object is still UB and
should be reported by ASAN.
Closes: #108287