This patch is moving out following intrinsics:
* vector.interleave2/deinterleave2
* vector.reverse
* vector.splice
from the experimental namespace.
All these intrinsics exist in LLVM for more than a year now, and are
widely used, so should not be considered as experimental.
In #88217 a large set of matchers was changed to only accept poison
values in splats, but not undef values. This is because we now use
poison for non-demanded vector elements, and allowing undef can cause
correctness issues.
This patch covers the remaining matchers by changing the AllowUndef
parameter of getSplatValue() to AllowPoison instead. We also carry out
corresponding renames in matchers.
As a followup, we may want to change the default for things like m_APInt
to m_APIntAllowPoison (as this is much less risky when only allowing
poison), but this change doesn't do that.
There is one caveat here: We have a single place
(X86FixupVectorConstants) which does require handling of vector splats
with undefs. This is because this works on backend constant pool
entries, which currently still use undef instead of poison for
non-demanded elements (because SDAG as a whole does not have an explicit
poison representation). As it's just the single use, I've open-coded a
getSplatValueAllowUndef() helper there, to discourage use in any other
places.
Rename has/dropPoisonGeneratingFlagsOrMetadata to
has/dropPoisonGeneratingAnnotations and make it also handle
nonnull, align and range return attributes on calls, similar
to the existing handling for !nonnull, !align and !range metadata.
Change all the cstval_pred_ty based PatternMatch helpers (things like
m_AllOnes and m_Zero) to only allow poison elements inside vector
splats, not undef elements.
Historically, we used to represent non-demanded elements in vectors
using undef. Nowadays, we use poison instead. As such, I believe that
support for undef in vector splats is no longer useful.
At the same time, while poison splat elements are pretty much always
safe to ignore, this is not generally the case for undef elements. We
have existing miscompiles in our tests due to this (see the
masked-merge-*.ll tests changed here) and it's easy to miss such cases
in the future, now that we write tests using poison instead of undef
elements.
I think overall, keeping support for undef elements no longer makes
sense, and we should drop it. Once this is done consistently, I think we
may also consider allowing poison in m_APInt by default, as doing that
change is much less risky than doing the same with undef.
This change involves a substantial amount of test changes. For most
tests, I've just replaced undef with poison, as I don't think there is
value in retaining both. For some tests (where the distinction between
undef and poison is important), I've duplicated tests.
Prior to #85863, the required parameters of llvm::isKnownNonZero were
Value and DataLayout. After, they are Value, Depth, and SimplifyQuery,
where SimplifyQuery is implicitly constructible from DataLayout. The
change to move Depth before SimplifyQuery needed callers to be updated
unnecessarily, and as commented in #85863, we actually want Depth to be
after SimplifyQuery anyway so that it can be defaulted and the caller
does not need to specify it.
Teaching ConstantFoldLoadFromUniformValue that types that are padded in
memory can't be considered as uniform.
Using the big hammer to prevent optimizations when loading from a
constant for which DataLayout::typeSizeEqualsStoreSize would return
false.
Main problem solved would be something like this:
store i17 -1, ptr %p, align 4
%v = load i8, ptr %p, align 1
If for example the i17 occupies 32 bits in memory, then LLVM IR doesn't
really tell where the padding goes. And even if we assume that the 15
most significant bits are padding, then they should be considered as
undefined (even if LLVM backend typically would pad with zeroes).
Anyway, for a big-endian target the load would read those most
significant bits, which aren't guaranteed to be one's. So it would be
wrong to constant fold the load as returning -1.
If LLVM IR had been more explicit about the placement of padding, then
we could allow the constant fold of the load in the example, but only
for little-endian.
Fixes: https://github.com/llvm/llvm-project/issues/81793
Fold gc.relocate of undef and null to undef and null respectively.
Similar transform is currently done by instcombine, but there is no
reason to not include it here as well.
This patch refactors the interface of the `computeKnownFPClass` family
to pass `SimplifyQuery` directly.
The motivation of this patch is to compute known fpclass with
`DomConditionCache`, which was introduced by
https://github.com/llvm/llvm-project/pull/73662. With
`DomConditionCache`, we can do more optimization with context-sensitive
information.
Example (extracted from
[fmt/format.h](e17bc67547/include/fmt/format.h (L3555-L3566))):
```
define float @test(float %x, i1 %cond) {
%i32 = bitcast float %x to i32
%cmp = icmp slt i32 %i32, 0
br i1 %cmp, label %if.then1, label %if.else
if.then1:
%fneg = fneg float %x
br label %if.end
if.else:
br i1 %cond, label %if.then2, label %if.end
if.then2:
br label %if.end
if.end:
%value = phi float [ %fneg, %if.then1 ], [ %x, %if.then2 ], [ %x, %if.else ]
%ret = call float @llvm.fabs.f32(float %value)
ret float %ret
}
```
We can prove the signbit of `%value` is always zero. Then the fabs can
be eliminated.
This patch merges the logic of `cannotBeOrderedLessThanZeroImpl` into
`computeKnownFPClass` to improve the signbit inference.
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
The specialisation will not be valid when ConstantInt gains native
support for vector types.
This is largely a mechanical change but with extra attention paid to constant
folding, InstCombineVectorOps.cpp, LoopFlatten.cpp and Verifier.cpp to
remove the need to call `getIntegerType()`.
Co-authored-by: Nikita Popov <github@npopov.com>
This patch passes `SimplifyQuery` to `computeKnownBits` directly in
`InstSimplify` and `InstCombine`.
As the `DomConditionCache` in #73662 is only used in `InstCombine`, it
is inconvenient to introduce a new argument `DC` to `computeKnownBits`.
When folding urem instructions we can end up not recognizing that
the output will always be 0 due to Value*s being different, despite
generating the same data (in this case, 2 different calls to vscale).
This patch recognizes the (x << N) & (add (x << M), -1) pattern that
instcombine replaces urem with after the two vscale calls have been
reduced to one via CSE, then replaces with 0 when x is a power of 2
and N >= M.
There are a number of and folds that are repeated for both
operand orders. Move these into a helper that is invoked with
both orders.
This is conceptually NFC, but may not be entirely so, as the order
of folds may change.
This code was incorrectly checking that the CtxI has required FMF, but
the context instruction need not always be the instrinsic call.
Check that the intrinsic call has the required FMF.
Fixes PR71548.
We need to check FPMathOperator for Ctx instruction before checking fast
math flag on this Ctx.
Ctx is not always an FPMathOperator, so explicitly check for it.
Fixes#71548.
Relative to the first attempt, this contains two changes:
First, we only handle the case where one side simplifies to true or
false, instead of calling simplification recursively. The previous
approach would return poison if one operand simplified to poison
(under the equality assumption), which is incorrect.
Second, we do not fold llvm.is.constant in simplifyWithOpReplaced().
We may be assuming that a value is constant, if the equality holds,
but it may not actually be constant. This is nominally just a QoI
issue, but the std::list implementation in libstdc++ relies on the
precise behavior in a way that causes miscompiles.
-----
and/or in logical (select) form benefit from generic simplifications via
simplifyWithOpReplaced(). However, the corresponding fold for plain
and/or currently does not exist.
Similar to selects, there are two general cases for this fold
(illustrated with `and`, but there are `or` conjugates).
The basic case is something like `(a == b) & c`, where the replacement
of a with b or b with a inside c allows it to fold to true or false.
Then the whole operation will fold to either false or `a == b`.
The second case is something like `(a != b) & c`, where the replacement
inside c allows it to fold to false. In that case, the operand can be
replaced with c, because in the case where a == b (and thus the icmp is
false), c itself will already be false.
As the test diffs show, this catches quite a lot of patterns in existing
test coverage. This also obsoletes quite a few existing special-case
and/or of icmp folds we have (e.g. simplifyAndOrOfICmpsWithLimitConst),
but I haven't removed anything as part of this patch in the interest of
risk mitigation.
Fixes#69050.
Fixes#69091.
Mostly the same as `and`. We also have a check for a useless
`llvm.ptrmask` if the ptr is already known aligned.
Differential Revision: https://reviews.llvm.org/D156633
and/or in logical (select) form benefit from generic simplifications via
simplifyWithOpReplaced(). However, the corresponding fold for plain
and/or currently does not exist.
Similar to selects, there are two general cases for this fold
(illustrated with `and`, but there are `or` conjugates).
The basic case is something like `(a == b) & c`, where the replacement
of a with b or b with a inside c allows it to fold to true or false.
Then the whole operation will fold to either false or `a == b`.
The second case is something like `(a != b) & c`, where the replacement
inside c allows it to fold to false. In that case, the operand can be
replaced with c, because in the case where a == b (and thus the icmp is
false), c itself will already be false.
As the test diffs show, this catches quite a lot of patterns in existing
test coverage. This also obsoletes quite a few existing special-case
and/or of icmp folds we have (e.g. simplifyAndOrOfICmpsWithLimitConst),
but I haven't removed anything as part of this patch in the interest of
risk mitigation.
Fixes#69050.
Fixes#69091.