cvt(t?)ps2dq/cvt(t?)pd2dq and cvtpd2ps are currently handled strictly.
This patch handles them using handleSSEVectorConvertIntrinsicByProp
(from https://github.com/llvm/llvm-project/pull/130705), generalized to
handle SSE intrinsics that do not have a rounding mode parameter.
This adds an explicit handler for:
- llvm.aarch64.neon.ld1x2, llvm.aarch64.neon.ld1x3,
llvm.aarch64.neon.ld1x4
- llvm.aarch64.neon.ld2, llvm.aarch64.neon.ld3, llvm.aarch64.neon.ld4
- llvm.aarch64.neon.ld2lane, llvm.aarch64.neon.ld3lane,
llvm.aarch64.neon.ld4lane
- llvm.aarch64.neon.ld2r, llvm.aarch64.neon.ld3r, llvm.aarch64.neon.ld4r
instead of relying on the default strict handler.
Updates the tests from https://github.com/llvm/llvm-project/pull/125267
Check whether each lane is fully initialized, and propagate the shadow
per lane instead of using the strict handling of visitInstruction.
Updates the tests from https://github.com/llvm/llvm-project/pull/129807
Changes the handling of:
- llvm.aarch64.neon.smaxv
- llvm.aarch64.neon.sminv
- llvm.aarch64.neon.umaxv
- llvm.aarch64.neon.uminv
- llvm.vector.reduce.smax
- llvm.vector.reduce.smin
- llvm.vector.reduce.umax
- llvm.vector.reduce.umin
- llvm.vector.reduce.fmax
- llvm.vector.reduce.fmin
from the default strict handling (visitInstruction) to
handleVectorReduceIntrinsic.
Also adds a parameter to handleVectorReduceIntrinsic to specify whether
the return type must match the elements of the vector.
Updates the tests from https://github.com/llvm/llvm-project/pull/129741,
https://github.com/llvm/llvm-project/pull/129810,
https://github.com/llvm/llvm-project/pull/129768
Change the handling of:
- llvm.aarch64.neon.fmaxp
- llvm.aarch64.neon.fminp
- llvm.aarch64.neon.fmaxnmp
- llvm.aarch64.neon.fminnmp
- llvm.aarch64.neon.smaxp
- llvm.aarch64.neon.sminp
- llvm.aarch64.neon.umaxp
- llvm.aarch64.neon.uminp
from the incorrect heuristic handler (maybeHandleSimpleNomemIntrinsic)
to handlePairwiseShadowOrIntrinsic.
Updates the tests from https://github.com/llvm/llvm-project/pull/129760
Adds a note that maybeHandleSimpleNomemIntrinsic may incorrectly match
horizontal/pairwise intrinsics.
The module currently stores the target triple as a string. This means
that any code that wants to actually use the triple first has to
instantiate a Triple, which is somewhat expensive. The change in #121652
caused a moderate compile-time regression due to this. While it would be
easy enough to work around, I think that architecturally, it makes more
sense to store the parsed Triple in the module, so that it can always be
directly queried.
For this change, I've opted not to add any magic conversions between
std::string and Triple for backwards-compatibilty purses, and instead
write out needed Triple()s or str()s explicitly. This is because I think
a decent number of them should be changed to work on Triple as well, to
avoid unnecessary conversions back and forth.
The only interesting part in this patch is that the default triple is
Triple("") instead of Triple() to preserve existing behavior. The former
defaults to using the ELF object format instead of unknown object
format. We should fix that as well.
x86 pairwise add and sub are currently handled by applying the pairwise add intrinsic to the shadow (https://github.com/llvm/llvm-project/pull/124835), due to the lack of an x86 pairwise OR intrinsic. handlePairwiseShadowOrIntrinsic was added (https://github.com/llvm/llvm-project/pull/126008) to handle Arm
pairwise add, but assumes that the intrinsic operates on each pair of elements as defined by the LLVM type. In contrast, x86 pairwise add/sub may sometimes have e.g., <1 x i64> as a parameter but actually be operating on <2 x i32>.
This patch generalizes handlePairwiseShadowOrIntrinsic, to allow reinterpreting the parameters to be a vector of specified element size, and then uses this function to handle x86 pairwise add/sub.
Apply handleShadowOr to llvm.[us]cmp. Previously, llvm.[su]cmp was correctly handled heuristically when each parameter type is the same as the return type (e.g., `call i8 @llvm.ucmp.i8.i8(i8 %x, i8 %y)`) but handled incorrectly by visitInstruction when the return type is different e.g., (`call i8 @llvm.ucmp.i8.i62(i62 %x, i62 %y)`, `call <4 x i8> @llvm.ucmp.v4i8.v4i32(<4 x i32> %x, <4 x i32> %y)`).
Updates the tests from https://github.com/llvm/llvm-project/pull/125790
This handles the following llvm.aarch64.neon intrinsics, which were suboptimally handled by visitInstruction:
- fcvtas, fcvtau
- fcvtms, fcvtmu
- fcvtns, fcvtnu
- fcvtps, fcvtpu
- fcvtzs, fcvtzu
The old instrumentation checked that the shadow of every element of the input vector was fully initialized, and aborted otherwise. The new instrumentation propagates the shadow: for each element of the output, the shadow is initialized iff the corresponding element of the input is *fully* initialized (since these are floating-point to integer conversions).
Updates the tests from https://github.com/llvm/llvm-project/pull/126095
This patch adds a function, handlePairwiseShadowOrIntrinsic that ORs
pairs of adjacent shadow values; this is suitable for propagating shadow
for 1- or 2-vector intrinsics that combine adjacent fields. It then
applies handlePairwiseShadowOrIntrinsic to Arm NEON pairwise add:
llvm.aarch64.neon.{addhn, raddhn} (currently incorrectly handled) and
llvm.aarch64.neon.{saddlp, uaddlp} (currently suboptimally handled).
Updates the tests from https://github.com/llvm/llvm-project/pull/125820.
Apply handleVectorReduceIntrinsic() to llvm.aarch64.neon.[su]addlv.
Previously, these were unknown intrinsics handled suboptimally by
visitInstruction.
Updates the tests from https://github.com/llvm/llvm-project/pull/125761
Apply handleVectorReduceIntrinsic() to Intrinsic::aarch64_neon_f{min,max}(mn)?v. Previously, these intrinsics were handled correctly (by maybeHandleSimpleNomemIntrinsic) if each parameter's type was the same as the return type; otherwise, they were handled suboptimally by visitInstruction().
Updates the tests from https://github.com/llvm/llvm-project/pull/125729.
This prevents the handlers from being called with blatantly inappropriate intrinsics.
Currently, if the handlers are called with an intrinsic that doesn't have enough arguments, it may abort; that is bad, but visible. The more insidious risk is that a handler is called with an intrinsic that has more arguments than expected; that will not visibly fail.
This adds handleVectorReduceWithStarterIntrinsic() (similar to
handleVectorReduceIntrinsic but for intrinsics with an additional
starting parameter) and uses it to handle
Intrinsic::vector_reduce_f{add,mul}.
Updates the tests from https://github.com/llvm/llvm-project/pull/125597
This generalizes handleVectorReduceIntrinsic to allow intrinsics where
the return type is not the same as the fields. This patch then applies
the generalized handleVectorReduceIntrinsic to support the following Arm
NEON add reduction to scalar intrinsics: llvm.aarch64.neon.{faddv,
saddv, uaddv}.
Updates the tests from https://github.com/llvm/llvm-project/pull/125271
The AVX/SSE variants are already handled heuristically (maybeHandleSimpleNomemIntrinsic via handleUnknownIntrinsic), but the AVX512 variants contain an additional parameter (the rounding method) which fails to match heuristically. This patch generalizes maybeHandleSimpleNomemIntrinsic to allow additional flags (ignored by MSan) and explicitly call it to handle AVX512 min/max ps/pd intrinsics.
It also updates the test added in https://github.com/llvm/llvm-project/pull/123980
https://github.com/llvm/llvm-project/pull/124159 uses
handleIntrinsicByApplyingToShadow for horizontal add/sub, but Vitaly
recommends always using the add version to avoid false negatives for
fully uninitialized data
(https://github.com/llvm/llvm-project/issues/124662).
This patch lays the groundwork by generalizing
handleIntrinsicByApplyingToShadow to allow using a different intrinsic
(of the same type as the original intrinsic) for the shadow. Planned
work will apply it to horizontal sub.
This reverts commit b9d301cc7e4fe4c442ec15169686fa4a18f5cdfc i.e.,
relands db79fb2a91df31a07f312f8e061936927ac5c506.
I had mistakenly thought this caused a buildbot breakage (the actual
culprit was my other patch,
https://github.com/llvm/llvm-project/pull/123980, which landed at the
same time) and thus had reverted it even though AFAIK it is not broken.
As part of the "RemoveDIs" work to eliminate debug intrinsics, we're
replacing methods that use Instruction*'s as positions with iterators.
This patch changes some more complex call-sites, those crossing file
boundaries and where I've had to perform some minor rewrites.
This patch adds explicit support for AVX masked load/store intrinsics,
largely by applying the intrinsics to the shadows (but subtly different
to handleIntrinsicByApplyingToShadow()).
We do not reuse the handleMaskedLoad/Store functions. The key challenge
is that the LLVM masked intrinsics require a vector of booleans, while
AVX masked intrinsics use the MSBs of a vector of integers.
X86InstCombineIntrinsic.cpp::simplifyX86MaskedLoad mentions that the x86
backend does not know how to efficiently convert from a vector of
booleans back into the AVX mask format; therefore, they (and we) do not
reduce AVX masked intrinsics into LLVM masked intrinsics.
As part of the "RemoveDIs" project, BasicBlock::iterator now carries a
debug-info bit that's needed when getFirstNonPHI and similar feed into
instruction insertion positions. Call-sites where that's necessary were
updated a year ago; but to ensure some type safety however, we'd like to
have all calls to getFirstNonPHI use the iterator-returning version.
This patch changes a bunch of call-sites calling getFirstNonPHI to use
getFirstNonPHIIt, which returns an iterator. All these call sites are
where it's obviously safe to fetch the iterator then dereference it. A
follow-up patch will contain less-obviously-safe changes.
We'll eventually deprecate and remove the instruction-pointer
getFirstNonPHI, but not before adding concise documentation of what
considerations are needed (very few).
---------
Co-authored-by: Stephen Tozer <Melamoto@gmail.com>
Horizontal add (hadd) and subtract (hsub) are currently heuristically
handled by `maybeHandleSimpleNomemIntrinsic()` (via
`handleUnknownIntrinsic()`), which computes the shadow by bitwise OR'ing
the two operands. This has false positives for hadd/hsub shadows. For
example, suppose the shadows for the two operands are 00000000 and
11111111 respectively. The expected shadow for the result is 00001111,
but `maybeHandleSimpleNomemIntrinsic` would compute it as 11111111.
This patch handles horizontal add using
`handleIntrinsicByApplyingToShadow` (from
https://github.com/llvm/llvm-project/pull/114490), which has no false
positives for hadd/hsub: if each pair of adjacent shadow values is zero
(fully initialized), the result will be zero (fully initialized). More
generally, it is precise for hadd/hsub if at least one of the two
adjacent shadow values in each pair is zero.
It does have some false negatives for hadd/hsub: if we add/subtract two
adjacent non-zero shadow values, some bits of the result may incorrectly
be zero. We consider this an acceptable tradeoff for performance. To
make shadow propagation precise, we want the equivalent of "horizontal
OR", but this is not available. Reducing horizontal OR to (permutation
plus bitwise OR) is left as an exercise for the reader.
The stated return type was incorrect; this patch corrects it. More generally, it explains how the Offset and its components fits into the overall shadow mapping calculation.
`handleIntrinsicByApplyingToShadow` (introduced in
https://github.com/llvm/llvm-project/pull/114490) requires that the
intrinsic supports integer-ish operands; this is not the case for all
intrinsics. This patch generalizes the function to bitcast the shadow
arguments to be the same type as the original intrinsic, thus
guaranteeing that the intrinsic exists. Additionally, it casts the
computed shadow to be an appropriate shadow type.
This function assumes that the intrinsic will handle arbitrary
bit-patterns (for example, if the intrinsic accepts floats for var1, we
assume that it works normally even if inputs are NaNs etc.).
This adds an experimental flag, msan-dump-strict-intrinsics (modeled
after msan-dump-strict-instructions), which prints out any intrinsics
that are heuristically handled. Additionally, MSan will print out
heuristically handled intrinsics when -debug is passed as a flag in
debug builds.
MSan's intrinsic handling can be broken down into:
1) special cases (usually highly accurate)
2) heuristic handling (sometimes erroneous)
3) not handled
This patch's -msan-dump-strict-intrinsics is intended to help debug Case
2. Case 3) (which includes all the heuristics that are not handled by
special cases nor heuristics) can be debugged using the existing
-msan-dump-strict-instructions.
This patch adds possibility to specify alignment for
llvm.masked.expandload/llvm.masked.compressstore intrinsics in IRBuilder
(this is mostly NFC for now since it's only used in MemorySanitizer, but
there is an intention to generate these intrinsics in the compiler
passes, e.g. in LoopVectorizer)
This adds a general function that handles intrinsics by applying the
intrinsic to the shadows, and applies it to the specific case of Arm
NEON TBL/TBX intrinsics.
This also updates the tests from
https://github.com/llvm/llvm-project/pull/114462
CTMark #113200 size overhead was 5.3%, now it's 4.7%.
The patch affects only signed integers.
https://alive2.llvm.org/ce/z/Lv5hyi
* The patch replaces code which extracted sign bit,
maximized/minimized it, then packed it back, with
simple sign bit flip. The another way to think about
transformation is as a subtraction of MIN_SINT from
A/B. Then we map MIN_SINT to 0, 0 to -MIN_SINT, and
MAX_SINT to MAX_UINT.
* Then to maximize/minimize A/B we don't need
to extract sign bit, we can apply shadow the
same way as to other bits.
* After sign bit flip, we had to switch to unsigned
version of the predicates.
* After change above getHighestPossibleValue/getLowestPossibleValue
became very similar, so we can combine into a single function.
* Because the function does sign bit flip and
requires unsigned predicates used for returned values,
there is no point in keeping it as a member of class,
to hide, we switch to function local lambda.