This adds a general function that handles intrinsics by applying the
intrinsic to the shadows, and applies it to the specific case of Arm
NEON TBL/TBX intrinsics.
This also updates the tests from
https://github.com/llvm/llvm-project/pull/114462
CTMark #113200 size overhead was 5.3%, now it's 4.7%.
The patch affects only signed integers.
https://alive2.llvm.org/ce/z/Lv5hyi
* The patch replaces code which extracted sign bit,
maximized/minimized it, then packed it back, with
simple sign bit flip. The another way to think about
transformation is as a subtraction of MIN_SINT from
A/B. Then we map MIN_SINT to 0, 0 to -MIN_SINT, and
MAX_SINT to MAX_UINT.
* Then to maximize/minimize A/B we don't need
to extract sign bit, we can apply shadow the
same way as to other bits.
* After sign bit flip, we had to switch to unsigned
version of the predicates.
* After change above getHighestPossibleValue/getLowestPossibleValue
became very similar, so we can combine into a single function.
* Because the function does sign bit flip and
requires unsigned predicates used for returned values,
there is no point in keeping it as a member of class,
to hide, we switch to function local lambda.
Fixes#111212.
This grows .text by 5.3% on CTMark, (or 2.6% large internal binary)
Perf regressed by 1.6%. We will try to improve in follow up patches.
It worth to pay some performance regression to fix
correctness to avoid stuff like #111212.
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
Everything in this pass uses a single addrspace 0 pointer type.
Don't try to create it using the typed pointer ctor.
This allows removing the type argument from
getShadowPtrForVAArgument().
This PR changes the sanitizer passes to be idempotent.
When any sanitizer pass is run after it has already been run before,
double instrumentation is seen in the resulting IR. This happens because
there is no check in the pass, to verify if IR has been instrumented
before.
This PR checks if "nosanitize_*" module flag is already present and if
true, return early without running the pass again.
This adds support for the Arm NEON vector shift instructions that follow
the same pattern as x86 (handleVectorShiftIntrinsic).
VSLI is not supported because it does not follow the 2-argument pattern
expected by handleVectorShiftIntrinsic.
This patch also updates the arm64-vshift.ll MSan test that was
introduced in
5d0a12d3e9
Cloning the vst_ intrinsics to apply them to the shadows did not work if
the arguments were floating-point, since the shadows are integers. This
patch changes MSan to create an intrinsic of the correct integer types.
Additionally, this patch adds support for vst1x_{2,3,4}; these can be
handled similarly to vst_{2,3,4}, since in all cases we are adapting the
corresponding intrinsic.
This also updates the tests.
It is now translated to `<1 x i64>`, which allows the removal of a bunch
of special casing.
This _incompatibly_ changes the ABI of any LLVM IR function with
`x86_mmx` arguments or returns: instead of passing in mmx registers,
they will now be passed via integer registers. However, the real-world
incompatibility caused by this is expected to be minimal, because Clang
never uses the x86_mmx type -- it lowers `__m64` to either `<1 x i64>`
or `double`, depending on ABI.
This change does _not_ eliminate the SelectionDAG `MVT::x86mmx` type.
That type simply no longer corresponds to an IR type, and is used only
by MMX intrinsics and inline-asm operands.
Because SelectionDAGBuilder only knows how to generate the
operands/results of intrinsics based on the IR type, it thus now
generates the intrinsics with the type MVT::v1i64, instead of
MVT::x86mmx. We need to fix this before the DAG LegalizeTypes, and thus
have the X86 backend fix them up in DAGCombine. (This may be a
short-lived hack, if all the MMX intrinsics can be removed in upcoming
changes.)
Works towards issue #98272.
zext does not allow converting vector shadow to scalar, so we must
manually convert it prior to calling zext in materializeOneCheck, for
which the 'ConvertedShadow' parameter isn't actually guaranteed to be
scalar (1). Note that it is safe/no-op to call convertShadowToScalar on
a shadow that is already scalar.
In contrast, the storeOrigin function already converts the (potentially
vector) shadow to scalar; we add a comment to note why it is load
bearing.
(1) In materializeInstructionChecks():
"// Disable combining in some cases. TrackOrigins checks each shadow to
pick
// correct origin.
bool Combine = !MS.TrackOrigins;
...
if (!Combine) {
materializeOneCheck(IRB, ConvertedShadow, ShadowData.Origin);
continue;
}"
Default intrinsic handling was to report any
uninitialized part of argument. However intrinsics
use mask which allow to ignore parts of input, so
it's OK to have vectors partially initialized.
In rare cases `SplitBlockAndInsertSimpleForLoop` in `paintOrigin`
crashes outsize iterators.
Somehow existing `SplitBlockAndInsertIfThen` do not invalidate
iterators.
Msan uses `__msan_param_tls` to pass shadow of
arguments. Position of arguments is expected to be
available during compile time, if size of the
argument is know. This is not true for vscale.
As work around we require that vscale parameters
are always initialized, then we don't need to pass
shadow.
Ret val should work out of the box as we don't
need to know size compile time.
Code expects `VectorOrPrimitiveTypeSizeInBits` compile time value,
which is not available for vscale.
In trivial case of the same type we need to do nothing.
Almost NFC, instrumentation is as correct as it was before.
We need InstrumentationList grouped by origin instruction,
so we used stable_sort. However these objects already grouped
because we never interleave sequences of `insertShadowCheck`
of different instrunction.
Pointer sort has artifact that it was deppendent on allocator behavior,
so we could inserted checks in a different order.
There is no test, as I failed to reproduce this with `opt`. My guess
is that for reproducer we need to increase fragmentation in the
allocator.
We don't need to create these instances of ArrayRef because
ConstantDataVector::get takes ArrayRef, and ArrayRef can be implicitly
constructed from C arrays.
We have a lot of repeated code with random constants.
Particular values are not important, the one just needs to be
bigger then another.
UR_NONTAKEN_WEIGHT is selected as it's the most common one.
Prior to #85863, the required parameters of llvm::isKnownNonZero were
Value and DataLayout. After, they are Value, Depth, and SimplifyQuery,
where SimplifyQuery is implicitly constructible from DataLayout. The
change to move Depth before SimplifyQuery needed callers to be updated
unnecessarily, and as commented in #85863, we actually want Depth to be
after SimplifyQuery anyway so that it can be defaulted and the caller
does not need to specify it.
Modify #77393 to clear shadow memory using `llvm.memset.*` when the size
is large, similar to `shouldUseBZeroPlusStoresToInitialize` in clang for
`-ftrivial-auto-var-init=`. The intrinsic, if lowered to libcall, will
use the msan interceptor.
The instruction selector lowers a `StoreInst` to multiple stores, not
utilizing `memset`. When the size is large (e.g.
`store { [100 x i32] } zeroinitializer, ptr %12, align 1`), the
generated code will be long (and `CodeGenPrepare::optimizeInst` will
even crash for a huge size).
```
// Test stack size
template <class T>
void DoNotOptimize(const T& var) { // deprecated by https://github.com/google/benchmark/pull/1493
asm volatile("" : "+m"(const_cast<T&>(var)));
}
int main() {
using LargeArray = std::array<int, 1000000>;
auto large_stack = []() { DoNotOptimize(LargeArray()); };
/////// CodeGenPrepare::optimizeInst triggers an assertion failure when creating an integer type with a bit width>2**23
large_stack();
}
```
KMSAN defaults to `msan-handle-asm-conservative`, which inserts
`__msan_instrument_asm_store` calls to unpoison indirect outputs in
inline assembly (e.g. `=m` constraints in source).
```c
unsigned f() {
unsigned v;
// __msan_instrument_asm_store unpoisons v before invoking the asm.
asm("movl $1,%0" : "=m"(v));
return v;
}
```
Extend the mechanism to userspace, but require explicit
`-mllvm -msan-handle-asm-conservative` for experiments for now.
As
https://docs.kernel.org/dev-tools/kmsan.html#inline-assembly-instrumentation
says, this approach may mask certain errors (an indirect output may not
actually be initialized), but it also helps to avoid a lot of false
positives.
Link: https://github.com/google/sanitizers/issues/192
Caller puts argument shadow one by one into __msan_va_arg_tls, until it
reaches kParamTLSSize. After that it still increment OverflowOffset but
does not store the shadow.
Callee needs OverflowOffset to prepare a shadow for the entire overflow
area. It's done by creating "varargs shadow copy" for complete list of
args, copying available shadow from __msan_va_arg_tls, and clearing the
rest.
However callee does not know if the tail of __msan_va_arg_tls was not
able to fit an argument, and callee will copy tail shadow into "varargs
shadow copy", and later used as a shadow for an omitted argument.
So that unused tail of the __msan_va_arg_tls must be cleared if left
unused.
This allows us to enable compiler-rt/test/msan/vararg_shadow.cpp for
x86.
Reviewers: kstoimenov, thurstond
Reviewed By: thurstond
Pull Request: https://github.com/llvm/llvm-project/pull/72707