The AVX/SSE variants are already handled heuristically (maybeHandleSimpleNomemIntrinsic via handleUnknownIntrinsic), but the AVX512 variants contain an additional parameter (the rounding method) which fails to match heuristically. This patch generalizes maybeHandleSimpleNomemIntrinsic to allow additional flags (ignored by MSan) and explicitly call it to handle AVX512 min/max ps/pd intrinsics.
It also updates the test added in https://github.com/llvm/llvm-project/pull/123980
https://github.com/llvm/llvm-project/pull/124159 uses
handleIntrinsicByApplyingToShadow for horizontal add/sub, but Vitaly
recommends always using the add version to avoid false negatives for
fully uninitialized data
(https://github.com/llvm/llvm-project/issues/124662).
This patch lays the groundwork by generalizing
handleIntrinsicByApplyingToShadow to allow using a different intrinsic
(of the same type as the original intrinsic) for the shadow. Planned
work will apply it to horizontal sub.
This reverts commit b9d301cc7e4fe4c442ec15169686fa4a18f5cdfc i.e.,
relands db79fb2a91df31a07f312f8e061936927ac5c506.
I had mistakenly thought this caused a buildbot breakage (the actual
culprit was my other patch,
https://github.com/llvm/llvm-project/pull/123980, which landed at the
same time) and thus had reverted it even though AFAIK it is not broken.
As part of the "RemoveDIs" work to eliminate debug intrinsics, we're
replacing methods that use Instruction*'s as positions with iterators.
This patch changes some more complex call-sites, those crossing file
boundaries and where I've had to perform some minor rewrites.
This patch adds explicit support for AVX masked load/store intrinsics,
largely by applying the intrinsics to the shadows (but subtly different
to handleIntrinsicByApplyingToShadow()).
We do not reuse the handleMaskedLoad/Store functions. The key challenge
is that the LLVM masked intrinsics require a vector of booleans, while
AVX masked intrinsics use the MSBs of a vector of integers.
X86InstCombineIntrinsic.cpp::simplifyX86MaskedLoad mentions that the x86
backend does not know how to efficiently convert from a vector of
booleans back into the AVX mask format; therefore, they (and we) do not
reduce AVX masked intrinsics into LLVM masked intrinsics.
As part of the "RemoveDIs" project, BasicBlock::iterator now carries a
debug-info bit that's needed when getFirstNonPHI and similar feed into
instruction insertion positions. Call-sites where that's necessary were
updated a year ago; but to ensure some type safety however, we'd like to
have all calls to getFirstNonPHI use the iterator-returning version.
This patch changes a bunch of call-sites calling getFirstNonPHI to use
getFirstNonPHIIt, which returns an iterator. All these call sites are
where it's obviously safe to fetch the iterator then dereference it. A
follow-up patch will contain less-obviously-safe changes.
We'll eventually deprecate and remove the instruction-pointer
getFirstNonPHI, but not before adding concise documentation of what
considerations are needed (very few).
---------
Co-authored-by: Stephen Tozer <Melamoto@gmail.com>
Horizontal add (hadd) and subtract (hsub) are currently heuristically
handled by `maybeHandleSimpleNomemIntrinsic()` (via
`handleUnknownIntrinsic()`), which computes the shadow by bitwise OR'ing
the two operands. This has false positives for hadd/hsub shadows. For
example, suppose the shadows for the two operands are 00000000 and
11111111 respectively. The expected shadow for the result is 00001111,
but `maybeHandleSimpleNomemIntrinsic` would compute it as 11111111.
This patch handles horizontal add using
`handleIntrinsicByApplyingToShadow` (from
https://github.com/llvm/llvm-project/pull/114490), which has no false
positives for hadd/hsub: if each pair of adjacent shadow values is zero
(fully initialized), the result will be zero (fully initialized). More
generally, it is precise for hadd/hsub if at least one of the two
adjacent shadow values in each pair is zero.
It does have some false negatives for hadd/hsub: if we add/subtract two
adjacent non-zero shadow values, some bits of the result may incorrectly
be zero. We consider this an acceptable tradeoff for performance. To
make shadow propagation precise, we want the equivalent of "horizontal
OR", but this is not available. Reducing horizontal OR to (permutation
plus bitwise OR) is left as an exercise for the reader.
The stated return type was incorrect; this patch corrects it. More generally, it explains how the Offset and its components fits into the overall shadow mapping calculation.
`handleIntrinsicByApplyingToShadow` (introduced in
https://github.com/llvm/llvm-project/pull/114490) requires that the
intrinsic supports integer-ish operands; this is not the case for all
intrinsics. This patch generalizes the function to bitcast the shadow
arguments to be the same type as the original intrinsic, thus
guaranteeing that the intrinsic exists. Additionally, it casts the
computed shadow to be an appropriate shadow type.
This function assumes that the intrinsic will handle arbitrary
bit-patterns (for example, if the intrinsic accepts floats for var1, we
assume that it works normally even if inputs are NaNs etc.).
This adds an experimental flag, msan-dump-strict-intrinsics (modeled
after msan-dump-strict-instructions), which prints out any intrinsics
that are heuristically handled. Additionally, MSan will print out
heuristically handled intrinsics when -debug is passed as a flag in
debug builds.
MSan's intrinsic handling can be broken down into:
1) special cases (usually highly accurate)
2) heuristic handling (sometimes erroneous)
3) not handled
This patch's -msan-dump-strict-intrinsics is intended to help debug Case
2. Case 3) (which includes all the heuristics that are not handled by
special cases nor heuristics) can be debugged using the existing
-msan-dump-strict-instructions.
This patch adds possibility to specify alignment for
llvm.masked.expandload/llvm.masked.compressstore intrinsics in IRBuilder
(this is mostly NFC for now since it's only used in MemorySanitizer, but
there is an intention to generate these intrinsics in the compiler
passes, e.g. in LoopVectorizer)
This adds a general function that handles intrinsics by applying the
intrinsic to the shadows, and applies it to the specific case of Arm
NEON TBL/TBX intrinsics.
This also updates the tests from
https://github.com/llvm/llvm-project/pull/114462
CTMark #113200 size overhead was 5.3%, now it's 4.7%.
The patch affects only signed integers.
https://alive2.llvm.org/ce/z/Lv5hyi
* The patch replaces code which extracted sign bit,
maximized/minimized it, then packed it back, with
simple sign bit flip. The another way to think about
transformation is as a subtraction of MIN_SINT from
A/B. Then we map MIN_SINT to 0, 0 to -MIN_SINT, and
MAX_SINT to MAX_UINT.
* Then to maximize/minimize A/B we don't need
to extract sign bit, we can apply shadow the
same way as to other bits.
* After sign bit flip, we had to switch to unsigned
version of the predicates.
* After change above getHighestPossibleValue/getLowestPossibleValue
became very similar, so we can combine into a single function.
* Because the function does sign bit flip and
requires unsigned predicates used for returned values,
there is no point in keeping it as a member of class,
to hide, we switch to function local lambda.
Fixes#111212.
This grows .text by 5.3% on CTMark, (or 2.6% large internal binary)
Perf regressed by 1.6%. We will try to improve in follow up patches.
It worth to pay some performance regression to fix
correctness to avoid stuff like #111212.
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
Everything in this pass uses a single addrspace 0 pointer type.
Don't try to create it using the typed pointer ctor.
This allows removing the type argument from
getShadowPtrForVAArgument().
This PR changes the sanitizer passes to be idempotent.
When any sanitizer pass is run after it has already been run before,
double instrumentation is seen in the resulting IR. This happens because
there is no check in the pass, to verify if IR has been instrumented
before.
This PR checks if "nosanitize_*" module flag is already present and if
true, return early without running the pass again.
This adds support for the Arm NEON vector shift instructions that follow
the same pattern as x86 (handleVectorShiftIntrinsic).
VSLI is not supported because it does not follow the 2-argument pattern
expected by handleVectorShiftIntrinsic.
This patch also updates the arm64-vshift.ll MSan test that was
introduced in
5d0a12d3e9
Cloning the vst_ intrinsics to apply them to the shadows did not work if
the arguments were floating-point, since the shadows are integers. This
patch changes MSan to create an intrinsic of the correct integer types.
Additionally, this patch adds support for vst1x_{2,3,4}; these can be
handled similarly to vst_{2,3,4}, since in all cases we are adapting the
corresponding intrinsic.
This also updates the tests.
It is now translated to `<1 x i64>`, which allows the removal of a bunch
of special casing.
This _incompatibly_ changes the ABI of any LLVM IR function with
`x86_mmx` arguments or returns: instead of passing in mmx registers,
they will now be passed via integer registers. However, the real-world
incompatibility caused by this is expected to be minimal, because Clang
never uses the x86_mmx type -- it lowers `__m64` to either `<1 x i64>`
or `double`, depending on ABI.
This change does _not_ eliminate the SelectionDAG `MVT::x86mmx` type.
That type simply no longer corresponds to an IR type, and is used only
by MMX intrinsics and inline-asm operands.
Because SelectionDAGBuilder only knows how to generate the
operands/results of intrinsics based on the IR type, it thus now
generates the intrinsics with the type MVT::v1i64, instead of
MVT::x86mmx. We need to fix this before the DAG LegalizeTypes, and thus
have the X86 backend fix them up in DAGCombine. (This may be a
short-lived hack, if all the MMX intrinsics can be removed in upcoming
changes.)
Works towards issue #98272.
zext does not allow converting vector shadow to scalar, so we must
manually convert it prior to calling zext in materializeOneCheck, for
which the 'ConvertedShadow' parameter isn't actually guaranteed to be
scalar (1). Note that it is safe/no-op to call convertShadowToScalar on
a shadow that is already scalar.
In contrast, the storeOrigin function already converts the (potentially
vector) shadow to scalar; we add a comment to note why it is load
bearing.
(1) In materializeInstructionChecks():
"// Disable combining in some cases. TrackOrigins checks each shadow to
pick
// correct origin.
bool Combine = !MS.TrackOrigins;
...
if (!Combine) {
materializeOneCheck(IRB, ConvertedShadow, ShadowData.Origin);
continue;
}"
Default intrinsic handling was to report any
uninitialized part of argument. However intrinsics
use mask which allow to ignore parts of input, so
it's OK to have vectors partially initialized.