Currently, the SwitchInst:
```
switch i32 %Val, label %else [ i32 0, label %A
i32 1, label %B
i32 2, label %C ]
```
is strictly handled i.e., MSan will check that %Val is fully
initialized. This is appropriate nearly all the time.
However, sometimes the compiler may convert (icmp + br) into a switch
statement. (icmp + br) has different semantics: MSan allows icmp eq/ne
with partly initialized inputs to still result in a fully initialized
output, if there exists a bit that is initialized in both inputs with a
differing value e.g., suppose:
```
%A = 00000000 00001010
%B = 00000000 00000110
%C = 00000000 00000011
%Val = 00000001 ???????? (where ? denotes an uninitialized bit)
```
Even though %Val has uninitialized bits, the initialized '1' bit
immediately to the left, compared to the corresponding initialized '0'
bit in %A/%B/%C suffices to prove that %Val does not match any of those
cases. This is similar to a real-world case with std::optional (where
the has_value bit may be initialized but the value is not).
This patch adds this relaxed icmp logic to the switch instrumentation as
well, to make MSan's behavior equivalent under optimization.
Note that this edge case only applies if the switch input value
definitively does not match *any* of the cases (matching any of the
cases requires an exact, fully initialized match). If it is uncertain
whether the switch input value could, depending on the uninitialized
bits, match one of the cases or not, MSan will report
use-of-uninitialized memory.
handleAVX512VectorGenericMaskedFP() assumes there is one vector of data
(excluding the mask). This patch generalizes it to allow multiple
vectors of data, which we assume will be munged together.
Future work can apply this to intrinsics such as:
```
<16 x float> @llvm.x86.avx512.mask.scalef.ps.512
(<16 x float>, <16 x float>, <16 x float>, i16, i32)
WriteThru A B Mask Rounding
```
This patch does not change MSan's instrumentation.
-msan-dump-{heuristic,strict}-instructions currently prints out two
lines per instruction:
1) instruction name only e.g., `call llvm.aarch64.neon.uqsub.v16i8`
2) the full instruction, including actual variables e.g., `%vqsubq_v.i15
= call noundef <16 x i8> @llvm.aarch64.neon.uqsub.v16i8(<16 x i8>
%vext21.i, <16 x i8> splat (i8 1)), !dbg !66`
Option 1) is too sparse for some uses, because it does not contain the
return types or parameter types (although `.v16i8` is part of the
function name in this example, in general, the function name does not
describe the types completely; e.g., `<16 x float>
llvm.x86.avx512.mask.scalef.ps.512(<16 x float>, <16 x float>, <16 x
float>, i16, i32)`). OTOH option 2) can be too verbose because it
contains the actual variables.
This patch adds a Goldilocks-verbosity line:
- instruction prototype (including return type and parameter types)
e.g., `call <16 x i8> @llvm.aarch64.neon.uqsub(<16 x i8>, <16 x i8>)`.
Note that this uses the base/non-overloaded function name, since the
parameter types are already included in the output.
aarch64.neon.bfmlalb/t perform dot-products after zeroing out the
odd/even-indexed values. We handle these by generalizing
handleVectorDotProductIntrinsic() and (mis-)using getPclmulMask().
Create a new `IRBuilderBase::CreateAllocationSize` method to compute the
runtime size of an alloca as a Value*. This handles both static and
dynamic allocas by computing `ArraySize * ElementSize`, and using
CreateTypeSize to properly handle scalable vectors.
This de-duplicates code across multiple instrumentation and analysis
passes and increases consistency.
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
handleVectorComparePackedIntrinsic() can currently handle x86 and Arm
NEON vector comparisons, but is a bit lax about checking the number of
operands. This patch parameterizes the handler to check for the correct
number of operands, and also that the 3rd operand in x86 vector
comparisons is an ImmArg.
This fills in missing gaps in MSan's AArch64 NEON vector conversion
intrinsic handling (intrinsics named aarch64_neon_vcvt* instead of
aarch64_neon_fcvt*). SVE support sold separately.
It also generalizes handleNEONVectorConvertIntrinsic to handle
conversions to/from fixed-point.
Propagate shadow by reusing existing `handleVectorPmaddIntrinsic()`
(used for analogous x86 instructions; renamed to
`handleVectorDotProductIntrinsic()`), instead of strictly handling.
Instead of strictly handling smmla/ummla/usmmla, this patch propagates
the shadow, with each output element considered initialized if all its
constituent inputs are fully initialized.
Split out from https://github.com/llvm/llvm-project/pull/171456.
This explicitly allows implicit truncation in a number of places,
prior to switching the default. This limits the scope of the
initial change.
MSan's handleVectorPmaddIntrinsic had an EltSizeInBits parameter to
override the incorrect element size for VNNI intrinsics. Now that the
element size has been corrected
(https://github.com/llvm/llvm-project/issues/97271), it is no longer
necessary to override the element size.
This patch also updates the comments.
Fixed the argument types of the following intrinsics to match with the
ISA:
- vpdpwssd_128, vpdpwssd_256, vpdpwssd_512,
- vpdpwssds_128, vpdpwssds_256, vpdpwssds_512
- vpdpwsud_128, vpdpwsud_256, vpdowsud_512
- vpdpwsuds_128, vpdpwsuds_256, vpdpwsuds_512
- vpdpwusd_128, vpdpwusd_256, vpdpwusd_512
- vpdpwusds_128, vpdpwusds_256, vpdpwusds_512
- vpdpwuud_128, vpdpwuud_256, vpdpwuud_512
- vpdpwuuds_128, vpdpwuuds_256, vpdpwuuds_512
Fixes#97271. Note that this is the last PR for the issue.
These horizontal add/sub instructions are currently handled by
adding/subtracting tuples of the first operand, followed by tuples of
the second operand. This is not the correct semantics for the 256-bit
insructions: they process the first half of the first operand, then the
first half of the second operand, then the second half of the first
operand, and finally the second half of the second operand (trust me bro
[*]).
This patch fixes the issue by applying the "shards" functionality that
was added in https://github.com/llvm/llvm-project/pull/167954, to handle
the top and bottom 128-bit "shards" in turn.
[*] clang/test/CodeGen/X86/avx2-builtins.c:
```
TEST_CONSTEXPR(match_v8si(_mm256_hadd_epi32(
(__m256i)(__v8si){10, 20, 30, 40, 50, 60, 70, 80},
(__m256i)(__v8si){5, 15, 25, 35, 45, 55, 65, 75}),
30,70,20,60,110,150,100,140));
```
This will allow fixing up the handling of AVX2 phadd/phsub instructions
in a future patch, by setting Shards = 2.
Currently, the extra functionality is not used.
Use the generalized handleVectorPmaddIntrinsic(), but multiplication by
an initialized zero does not guarantee that the result is zero
(counter-example: multiply zero by NaN).
This generalizes `handleVectorPmaddIntrinsic()`:
- potentially handle floating-point type intrinsics (e.g.,
`llvm.x86.avx512bf16.dpbf16ps.512`). This usage is not enabled yet.
- "multiplication with an initialized zero guarantees that the
corresponding output becomes initialized" is now gated by a parameter
MSan currently crashes at compile-time when it encounters
target("aarch64.svcount") (e.g.,
https://github.com/llvm/llvm-project/pull/164315). This patch duct-tapes
MSan so that it won't crash at compile-time, and instead propagates a
clean shadow (resulting in false negatives but not false positives).
The `masked.load`, `masked.store`, `masked.gather` and `masked.scatter`
intrinsics currently accept a separate alignment immarg. Replace this
with an `align` attribute on the pointer / vector of pointers argument.
This is the standard representation for alignment information on
intrinsics, and is already used by all other memory intrinsics. This
means the signatures now match llvm.expandload, llvm.vp.load, etc.
(Things like llvm.memcpy used to have a separate alignment argument as
well, but were already migrated a long time ago.)
It's worth noting that the masked.gather and masked.scatter intrinsics
previously accepted a zero alignment to indicate the ABI type alignment
of the element type. This special case is gone now: If the align
attribute is omitted, the implied alignment is 1, as usual. If ABI
alignment is desired, it needs to be explicitly emitted (which the
IRBuilder API already requires anyway).
MemorySanitizer currently does a lot of pointer arithmetic using
ptrtoint+add+inttoptr instead of using getelementptr. As far as I can
tell, there is no need to use this pattern -- msan is not trying to
synthesize pointers with different provenance here. The pointers in
question stay within one object (like the TLS parameter area).
I suspect that this is just a leftover from pre-opaque-pointer types
where this was a natural way to perform offset arithmetic. Nowadays we
should just emit a getelementptr i8, aka ptradd.
This generalizes handleAVX512VectorGenericMaskedFP() (introduced in
#158397), to potentially handle intrinsics that have A/WriteThru/Mask in
an operand order that is different to AVX512/AVX10 rcp and rsqrt. Any
operands other than A and WriteThru must be fully initialized.
For example, the generalized handler could be applied in follow-up work
to many of the AVX512 rndscale intrinsics:
```
<32 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.512(<32 x half>, i32, <32 x half>, i32, i32)
<16 x float> @llvm.x86.avx512.mask.rndscale.ps.512(<16 x float>, i32, <16 x float>, i16, i32)
<8 x double> @llvm.x86.avx512.mask.rndscale.pd.512(<8 x double>, i32, <8 x double>, i8, i32)
A Imm WriteThru Mask Rounding
<8 x float> @llvm.x86.avx512.mask.rndscale.ps.256(<8 x float>, i32, <8 x float>, i8)
<4 x float> @llvm.x86.avx512.mask.rndscale.ps.128(<4 x float>, i32, <4 x float>, i8)
<4 x double> @llvm.x86.avx512.mask.rndscale.pd.256(<4 x double>, i32, <4 x double>, i8)
<2 x double> @llvm.x86.avx512.mask.rndscale.pd.128(<2 x double>, i32, <2 x double>, i8)
A Imm WriteThru Mask
```
Approximately handle avx512_{packssdw/packsswb/packusdw/packuswb} with
the existing handleVectorPackIntrinsic(), instead of relying on the
default (strict) handler.
https://github.com/llvm/llvm-project/pull/153927 incorrectly cast using
a hardcoded reduction factor of two, rather than using the parameter.
This caused false negatives but not false positives. (The only incorrect
case was a reduction factor of four; if four values {A,B,C,D} are being
reduced, the result is fully zero iff {A,B} and {C,D} are both zero
after pairwise reduction. If only one of those reduced pairs is zero,
then the quadwise reduction is non-zero.)
Currently visitIntrinsicInst() is a long, partly unsorted list. This patch groups them into cross-platform, X86 SIMD, and Arm SIMD families, making the overall intent of visitIntrinsicInst() clearer:
```
void visitIntrinsicInst(IntrinsicInst &I) {
if (maybeHandleCrossPlatformIntrinsic(I))
return;
if (maybeHandleX86SIMDIntrinsic(I))
return;
if (maybeHandleArmSIMDIntrinsic(I))
return;
if (maybeHandleUnknownIntrinsic(I))
return;
visitInstruction(I);
}
```
There is one disadvantage: the compiler will not tell us if the switch statements in the handlers have overlapping coverage.
This applies the pmadd handler (recently improved in https://github.com/llvm/llvm-project/pull/153353) to the Avx512
equivalent of the pmaddw and pmaddubs intrinsics:
<16 x i32> @llvm.x86.avx512.pmaddw.d.512(<32 x i16>, <32 x i16>)
<32 x i16> @llvm.x86.avx512.pmaddubs.w.512(<64 x i8>, <64 x i8>)
llvm.x86.sse.pshuf.w(<1 x i64>, i8) and llvm.x86.avx512.pshuf.b.512(<64
x i8>, <64 x i8>) are currently handled strictly, which is suboptimal.
llvm.x86.ssse3.pshuf.b(<1 x i64>, <1 x i64>)
llvm.x86.ssse3.pshuf.b.128(<16 x i8>, <16 x i8>) and
llvm.x86.avx2.pshuf.b(<32 x i8>, <32 x i8>) are currently heuristically
handled using maybeHandleSimpleNomemIntrinsic, which is incorrect.
Since the second argument is the shuffle order, we instrument all these
intrinsics using `handleIntrinsicByApplyingToShadow(...,
/*trailingVerbatimArgs=*/1)`
(https://github.com/llvm/llvm-project/pull/114490).
This reverts commit cf002847a464c004a57ca4777251b1aafc33d958 i.e.,
relands ba603b5e4d44f1a25207a2a00196471d2ba93424. It was reverted
because it was subtly wrong: multiplying an uninitialized zero should
not result in an initialized zero.
This reland fixes the issue by using instrumentation analogous to
visitAnd (bitwise AND of an initialized zero and an uninitialized value
results in an initialized value). Additionally, this reland expands a
test case; fixes the commit message; and optimizes the change to avoid
the need for horizontalReduce.
The current instrumentation has false positives: it does not take into
account that multiplying an initialized zero value with an uninitialized
value results in an initialized zero value This change fixes the issue
during the multiplication step. The horizontal add step is modeled using
bitwise OR.
Future work can apply this improved handler to the AVX512 equivalent
intrinsics (x86_avx512_pmaddw_d_512, x86_avx512_pmaddubs_w_512.) and AVX
VNNI intrinsics.
The current instrumentation has false positives: if there is a single uninitialized bit in any of the operands, the entire output is poisoned. This does not take into account that multiplying an uninitialized value with zero results in an initialized zero value.
This step allows elements that are zero to clear the corresponding shadow during the multiplication step. The horizontal add step and accumulation step (if any) are modeled using bitwise OR.
Future work can apply this improved handler to the AVX512 equivalent intrinsics (x86_avx512_pmaddw_d_512, x86_avx512_pmaddubs_w_512.) and AVX VNNI intrinsics.
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.
This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
This slightly relaxes the invariant established in #149310, by also
allowing the lifetime argument to be poison. This is to support the
typical pattern of RAUWing with poison when removing an instruction.
It's worth noting that this does not require any conservative
assumptions, lifetimes with poison arguments can simply be skipped.
Fixes https://github.com/llvm/llvm-project/issues/151119.
e.g.,
<16 x i8> @llvm.x86.vgf2p8affineqb.128(<16 x i8>, <16 x i8>, i8)
<32 x i8> @llvm.x86.vgf2p8affineqb.256(<32 x i8>, <32 x i8>, i8)
<64 x i8> @llvm.x86.vgf2p8affineqb.512(<64 x i8>, <64 x i8>, i8)
Out A x b
where A and x are packed matrices, b is a vector, Out = A * x + b in
GF(2)
Multiplication in GF(2) is equivalent to bitwise AND. However, the
matrix computation also includes a parity calculation.
For the bitwise AND of bits V1 and V2, the exact shadow is:
Out_Shadow = (V1_Shadow & V2_Shadow) | (V1 & V2_Shadow) | (V1_Shadow &
V2)
We approximate the shadow of gf2p8affine using:
Out_Shadow = _mm512_gf2p8affine_epi64_epi8(x_Shadow, A_shadow, 0)
| _mm512_gf2p8affine_epi64_epi8(x, A_shadow, 0)
| _mm512_gf2p8affine_epi64_epi8(x_Shadow, A, 0)
| _mm512_set1_epi8(b_Shadow)
This approximation has false negatives: if an intermediate dot-product
contains an even number of 1's, the parity is 0.
It has no false positives.
Updates the test from https://github.com/llvm/llvm-project/pull/149258
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.