This applies the pmadd handler (recently improved in https://github.com/llvm/llvm-project/pull/153353) to the Avx512
equivalent of the pmaddw and pmaddubs intrinsics:
<16 x i32> @llvm.x86.avx512.pmaddw.d.512(<32 x i16>, <32 x i16>)
<32 x i16> @llvm.x86.avx512.pmaddubs.w.512(<64 x i8>, <64 x i8>)
llvm.x86.sse.pshuf.w(<1 x i64>, i8) and llvm.x86.avx512.pshuf.b.512(<64
x i8>, <64 x i8>) are currently handled strictly, which is suboptimal.
llvm.x86.ssse3.pshuf.b(<1 x i64>, <1 x i64>)
llvm.x86.ssse3.pshuf.b.128(<16 x i8>, <16 x i8>) and
llvm.x86.avx2.pshuf.b(<32 x i8>, <32 x i8>) are currently heuristically
handled using maybeHandleSimpleNomemIntrinsic, which is incorrect.
Since the second argument is the shuffle order, we instrument all these
intrinsics using `handleIntrinsicByApplyingToShadow(...,
/*trailingVerbatimArgs=*/1)`
(https://github.com/llvm/llvm-project/pull/114490).
This reverts commit cf002847a464c004a57ca4777251b1aafc33d958 i.e.,
relands ba603b5e4d44f1a25207a2a00196471d2ba93424. It was reverted
because it was subtly wrong: multiplying an uninitialized zero should
not result in an initialized zero.
This reland fixes the issue by using instrumentation analogous to
visitAnd (bitwise AND of an initialized zero and an uninitialized value
results in an initialized value). Additionally, this reland expands a
test case; fixes the commit message; and optimizes the change to avoid
the need for horizontalReduce.
The current instrumentation has false positives: it does not take into
account that multiplying an initialized zero value with an uninitialized
value results in an initialized zero value This change fixes the issue
during the multiplication step. The horizontal add step is modeled using
bitwise OR.
Future work can apply this improved handler to the AVX512 equivalent
intrinsics (x86_avx512_pmaddw_d_512, x86_avx512_pmaddubs_w_512.) and AVX
VNNI intrinsics.
The current instrumentation has false positives: if there is a single uninitialized bit in any of the operands, the entire output is poisoned. This does not take into account that multiplying an uninitialized value with zero results in an initialized zero value.
This step allows elements that are zero to clear the corresponding shadow during the multiplication step. The horizontal add step and accumulation step (if any) are modeled using bitwise OR.
Future work can apply this improved handler to the AVX512 equivalent intrinsics (x86_avx512_pmaddw_d_512, x86_avx512_pmaddubs_w_512.) and AVX VNNI intrinsics.
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.
This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
This slightly relaxes the invariant established in #149310, by also
allowing the lifetime argument to be poison. This is to support the
typical pattern of RAUWing with poison when removing an instruction.
It's worth noting that this does not require any conservative
assumptions, lifetimes with poison arguments can simply be skipped.
Fixes https://github.com/llvm/llvm-project/issues/151119.
e.g.,
<16 x i8> @llvm.x86.vgf2p8affineqb.128(<16 x i8>, <16 x i8>, i8)
<32 x i8> @llvm.x86.vgf2p8affineqb.256(<32 x i8>, <32 x i8>, i8)
<64 x i8> @llvm.x86.vgf2p8affineqb.512(<64 x i8>, <64 x i8>, i8)
Out A x b
where A and x are packed matrices, b is a vector, Out = A * x + b in
GF(2)
Multiplication in GF(2) is equivalent to bitwise AND. However, the
matrix computation also includes a parity calculation.
For the bitwise AND of bits V1 and V2, the exact shadow is:
Out_Shadow = (V1_Shadow & V2_Shadow) | (V1 & V2_Shadow) | (V1_Shadow &
V2)
We approximate the shadow of gf2p8affine using:
Out_Shadow = _mm512_gf2p8affine_epi64_epi8(x_Shadow, A_shadow, 0)
| _mm512_gf2p8affine_epi64_epi8(x, A_shadow, 0)
| _mm512_gf2p8affine_epi64_epi8(x_Shadow, A, 0)
| _mm512_set1_epi8(b_Shadow)
This approximation has false negatives: if an intermediate dot-product
contains an even number of 1's, the parity is 0.
It has no false positives.
Updates the test from https://github.com/llvm/llvm-project/pull/149258
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
After #149310 the pointer argument of lifetime.start/lifetime.end is
guaranteed to be an alloca, so we don't need to go through
findAllocaForValue() anymore, and don't have to have special handling
for the case where it fails.
When disjoint OR was specified and a bit position contained a 1 in both
operands, #145990 would set the corresponding shadow bit to
uninitialized. However, the output of the operation is (LLVM) 'poison'
for the entire result, hence the entire shadow ought to be
uninitialized. This patch corrects the issue.
Note: since this patch reduces false negatives, buggy code that formerly
passed with MSan may start failing.
The current MSan handler for abs, like Hercules' in New York, ignores
is_int_min_poison. This patch fixes the issue by poisoning the shadow
corresponding to each int_min input value if is_int_min_poison.
This handles `llvm.x86.avx512.mask.pmov{,s,us}.*.512` using
`handleIntrinsicByApplyingToShadow()` where possible, otherwise using a
customized slow-path handler, `handleAVX512VectorDownConvert()`.
Note that shadow propagation of `pmov{s,us}` (signed/unsigned
saturation) are approximated using truncation. Future work could extend
`handleAVX512VectorDownConvert()` to use `GetMinMaxUnsigned()` to handle
saturation precisely.
The "V1" and "V2" values were already NOT'ed, hence the calculation of disjoint OR in #145990 was incorrect. This patch fixes the issue, with some refactoring and renaming of variables.
The disjoint OR (https://github.com/llvm/llvm-project/pull/72583) of two '1's is poison, hence the MSan ought to consider the result uninitialized (rather than initialized - i.e. a false negative - as per the existing instrumentation which ignores disjointedness). This patch adds a flag, `-msan-precise-disjoint-or`, which defaults to false (the legacy behavior). A future patch will default this flag to true.
Updates the test from https://github.com/llvm/llvm-project/pull/145982
The current instrumentation of Intrinsic::{ctlz,cttz} has false positives. For example, consider `ctlz(0001 11??)` whereby `0` and `1` denotes initialized bits (with concrete values of 0 and 1 respectively) and `?` denotes an uninitialized bit. The result (of 3) is well-defined and the shadow ought to be fully initialized, but the current instrumentation marks it as fully uninitialized.
This patch improves the fidelity of the instrumentation by comparing the number of leading (for ctlz; trailing for cttz) zeros in the concrete value and the shadow.
This patch also renames the function from 'handleCountZeroes' to 'handleLeadingTrailingCountZeros', to clarify that the intrinsics handled do not count all the zeros (unlike `llvm.ctpop`, which counts all the 1s).
This reverts commit 5eb5f0d8760c6b757c1da22682b5cf722efee489 i.e.,
relands 1b71ea411a9d36705663b1724ececbdfec7cc98c.
Test case was failing on aarch64 because the long double type is
implemented differently on x86 vs aarch64. This reland restricts the
test to x86.
----
Original CL description:
A commonly used aid for debugging MSan reports is
`__msan_print_shadow()`, which requires manual app code annotations
(typically of the variable in the UUM report or nearby). This is in
contrast to ASan, which automatically prints out the shadow map when a
check fails.
This patch changes MSan to print the shadow that failed an outlined
check (checks are outlined per function after the
`-msan-instrumentation-with-call-threshold` is exceeded) if verbosity >=
1. Note that we do not print out the shadow map of "neighboring"
variables because this is technically infeasible; see "Caveat" below.
This patch can be easier to use than `__msan_print_shadow()` because
this does not require manual app code annotations. Additionally, due to
optimizations, `__msan_print_shadow()` calls can sometimes spuriously
affect whether a variable is initialized.
As a side effect, this patch also enables outlined checks for
arbitrary-sized shadows (vs. the current hardcoded handlers for
{1,2,4,8}-byte shadows).
Caveat: the shadow does not necessarily correspond to an individual user
variable, because MSan instrumentation may combine and/or truncate
multiple shadows prior to emitting a check that the mangled shadow is
zero (e.g., `convertShadowToScalar()`,
`handleSSEVectorConvertIntrinsic()`, `materializeInstructionChecks()`).
OTOH it is arguably a strength that this feature emit the shadow that
directly matters for the MSan check, but which cannot be obtained using
the MSan API.
A commonly used aid for debugging MSan reports is `__msan_print_shadow()`, which requires manual app code annotations (typically of the variable in the UUM report or nearby). This is in contrast to ASan, which automatically prints out the shadow map when a check fails.
This patch changes MSan to print the shadow that failed an outlined check (checks are outlined per function after the `-msan-instrumentation-with-call-threshold` is exceeded) if verbosity >= 1. Note that we do not print out the shadow map of "neighboring" variables because this is technically infeasible; see "Caveat" below.
This patch can be easier to use than `__msan_print_shadow()` because this does not require manual app code annotations. Additionally, due to optimizations, `__msan_print_shadow()` calls can sometimes spuriously affect whether a variable is initialized.
As a side effect, this patch also enables outlined checks for arbitrary-sized shadows (vs. the current hardcoded handlers for {1,2,4,8}-byte shadows).
Caveat: the shadow does not necessarily correspond to an individual user variable, because MSan instrumentation may combine and/or truncate multiple shadows prior to emitting a check that the mangled shadow is zero (e.g., `convertShadowToScalar()`, `handleSSEVectorConvertIntrinsic()`, `materializeInstructionChecks()`). OTOH it is arguably a strength that this feature emit the shadow that directly matters for the MSan check, but which cannot be obtained using the MSan API.
This patch adds an off-by-default flag which, when enabled via `-mllvm -msan-poison-undef-vectors=true`, fixes a false negative in MSan (partially-undefined constant fixed-length vectors). It is currently off by default since, by fixing the false positive, code/tests that previously passed MSan may start failing. The default will be changed in a future patch.
Prior to this patch, MSan computes that partially-undefined constant fixed-length vectors are fully initialized, which leads to false negatives; moreover, benign vector rewriting could theoretically flip MSan's shadow computation from initialized to uninitialized or vice-versa (*). `-msan-poison-undef-vectors=true` calculates the shadow precisely: for each element of the vector, the corresponding shadow is fully uninitialized if the element is undefined/poisoned, otherwise it is fully initialized.
Updates the test from https://github.com/llvm/llvm-project/pull/143823
(*) For example:
```
%x = insertelement <2 x i64> <i64 0, i64 poison>, i64 42, i64 0
%y = insertelement <2 x i64> <i64 poison, i64 poison>, i64 42, i64 0
```
%x and %y are equivalent but, prior to this patch, MSan incorrectly computes the shadow of %x as <0, 0> rather than <0, -1>.
This happens because `getShadow(Value *V)` has a special case for fully undefined/poisoned values, but partially undefined values fall-through and are given a clean shadow. This leads to false negatives (no false positives).
Note: MSan correctly handles InsertElementInst, but the shadow of the initial constant vector may still be wrong and be propagated.
Showing that the same approximation happens for other composite types is left as an exercise for the reader.
With more understanding of PowerPC32 ABI we've rewritten the
`VarArgPowerPC32Helper`.
New implementation fills shadow for both `reg_save_area` and
`overflow_arg_area`.
It does not copy shadow for floating-point arguments, as they are stored
in a separate space.
This implementation does not fully support passing arguments `byVal`.
This will be fixed in future PRs.
Tests were also updated via `llvm/utils/update_test_checks.py`.
Changed last parameter type for `MsanMetadataPtrForLoadN` and
`MsanMetadataPtrForStoreN` from `i64` to `IntptrTy` to keep it
consistent with call in `getShadowOriginPtrKernelNoVec`
Co-authored-by: Kamil Kashapov <kashapov@ispras.ru>
cvt(t?)ps2dq/cvt(t?)pd2dq and cvtpd2ps are currently handled strictly.
This patch handles them using handleSSEVectorConvertIntrinsicByProp
(from https://github.com/llvm/llvm-project/pull/130705), generalized to
handle SSE intrinsics that do not have a rounding mode parameter.
This adds an explicit handler for:
- llvm.aarch64.neon.ld1x2, llvm.aarch64.neon.ld1x3,
llvm.aarch64.neon.ld1x4
- llvm.aarch64.neon.ld2, llvm.aarch64.neon.ld3, llvm.aarch64.neon.ld4
- llvm.aarch64.neon.ld2lane, llvm.aarch64.neon.ld3lane,
llvm.aarch64.neon.ld4lane
- llvm.aarch64.neon.ld2r, llvm.aarch64.neon.ld3r, llvm.aarch64.neon.ld4r
instead of relying on the default strict handler.
Updates the tests from https://github.com/llvm/llvm-project/pull/125267