Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
This patch moves the existing trunc+extractlement -> extractelement+bitcast fold into a foldVecExtTruncToExtElt helper and extends the helper to handle trunc+lshr+extractelement cases as well.
Fixes#107404
In some cases, if an undef value is the product of another instcombine
simplification, a bitcast of undef is simplified to a zeroinitializer
vector instead of undef.
Previously, (zext (icmp ne (and X, (1 << ShAmt)), 0)) has only been
folded if the bit width of X and the result were equal. Use a trunc or
zext instruction to also support other bit widths.
This is a follow-up to commit 533190acdb9d2ed774f96a998b5c03be3df4f857,
which introduced a regression: (zext (icmp ne (and (lshr X ShAmt) 1) 0))
is not folded any longer to (zext/trunc (and (lshr X ShAmt) 1)) since
the commit introduced the fold of (icmp ne (and (lshr X ShAmt) 1) 0) to
(icmp ne (and X (1 << ShAmt)) 0). The change introduced by this commit
restores this fold.
Alive proof: https://alive2.llvm.org/ce/z/MFkNXs
Relates to issue #86813 and pull request #101838.
This patch folds `(bitcast (or (and (bitcast X to int), signmask), nneg
Y) to fp)` into `copysign((bitcast Y to fp), X)`. I found this pattern
exists in some graphics applications/math libraries.
Alive2: https://alive2.llvm.org/ce/z/ggQZV2
It is now translated to `<1 x i64>`, which allows the removal of a bunch
of special casing.
This _incompatibly_ changes the ABI of any LLVM IR function with
`x86_mmx` arguments or returns: instead of passing in mmx registers,
they will now be passed via integer registers. However, the real-world
incompatibility caused by this is expected to be minimal, because Clang
never uses the x86_mmx type -- it lowers `__m64` to either `<1 x i64>`
or `double`, depending on ABI.
This change does _not_ eliminate the SelectionDAG `MVT::x86mmx` type.
That type simply no longer corresponds to an IR type, and is used only
by MMX intrinsics and inline-asm operands.
Because SelectionDAGBuilder only knows how to generate the
operands/results of intrinsics based on the IR type, it thus now
generates the intrinsics with the type MVT::v1i64, instead of
MVT::x86mmx. We need to fix this before the DAG LegalizeTypes, and thus
have the X86 backend fix them up in DAGCombine. (This may be a
short-lived hack, if all the MMX intrinsics can be removed in upcoming
changes.)
Works towards issue #98272.
This reverts commit c45f939e34dafaf0f57fd1d93df7df5cc89f1dec.
This refactoring turned out to not be useful for the case I had
originally in mind, so revert it for now.
We can transfer a nuw flag from the gep to the add. Additionally,
the inbounds + nneg case can be relaxed to nusw + nneg. Finally,
don't forget to pass the correct context instruction to
SimplifyQuery.
We can't preserve the context across a non-speculatable instruction,
as this might introduce a trap. Alternatively, we could also
insert all the replacement instruction at the use-site, but that
would be a more intrusive change for the sake of this edge case.
Fixes https://github.com/llvm/llvm-project/issues/95547.
Fold
``` llvm
define i32 @src(i32 %x, i32 %y) {
%base = inttoptr i32 %x to ptr
%ptr = getelementptr inbounds i8, ptr %base, i32 %y
%r = ptrtoint ptr %ptr to i32
ret i32 %r
}
```
where both `%base` and `%ptr` have only one use, to
``` llvm
define i32 @tgt(i32 %x, i32 %y) {
%r = add i32 %x, %y
ret i32 %r
}
```
The `add` can be `nuw` if the GEP is `inbounds` and the offset is
non-negative. The relevant Alive2 proof is
https://alive2.llvm.org/ce/z/nP3RWy.
### Motivation
It seems unnecessary to convert `int` to `ptr` just to get its offset.
In most cases, they generates the same assembly, but sometimes it may
miss some optimizations since the analysis of `GEP` is not as perfect as
that of arithmetic operation. One example is
e3c822bf41/bench/protobuf/optimized/generated_message_reflection.cc.ll (L39860-L39873)
``` llvm
%conv.i188 = zext i32 %145 to i64
%add.i189 = add i64 %conv.i188, %125
%146 = load i16, ptr %num_aux_entries10.i, align 2
%conv2.i191 = zext i16 %146 to i64
%mul.i192 = shl nuw nsw i64 %conv2.i191, 3
%add3.i193 = add i64 %add.i189, %mul.i192
%147 = inttoptr i64 %add3.i193 to ptr
%sub.ptr.lhs.cast.i195 = ptrtoint ptr %144 to i64
%sub.ptr.rhs.cast.i196 = ptrtoint ptr %143 to i64
%sub.ptr.sub.i197 = sub i64 %sub.ptr.lhs.cast.i195, %sub.ptr.rhs.cast.i196
%add.ptr = getelementptr inbounds i8, ptr %147, i64 %sub.ptr.sub.i197
%sub.ptr.lhs.cast = ptrtoint ptr %add.ptr to i64
%sub.ptr.sub = sub i64 %sub.ptr.lhs.cast, %125
```
where `%conv.i188` first adds `%125` and then subtracts `%125` (the
result is `%sub.ptr.sub`), which can be optimized.
In #88217 a large set of matchers was changed to only accept poison
values in splats, but not undef values. This is because we now use
poison for non-demanded vector elements, and allowing undef can cause
correctness issues.
This patch covers the remaining matchers by changing the AllowUndef
parameter of getSplatValue() to AllowPoison instead. We also carry out
corresponding renames in matchers.
As a followup, we may want to change the default for things like m_APInt
to m_APIntAllowPoison (as this is much less risky when only allowing
poison), but this change doesn't do that.
There is one caveat here: We have a single place
(X86FixupVectorConstants) which does require handling of vector splats
with undefs. This is because this works on backend constant pool
entries, which currently still use undef instead of poison for
non-demanded elements (because SDAG as a whole does not have an explicit
poison representation). As it's just the single use, I've open-coded a
getSplatValueAllowUndef() helper there, to discourage use in any other
places.
This reverts commit d80d5b923c6f611590a12543bdb33e0c16044d44.
It wasn't a particularly important transform to begin with and caused
some codegen regressions on targets that prefer `sitofp` so dropping.
Might re-visit along with adding `nneg` flag to `uitofp` so its easily
reversable for the backend.
This patch enables more optimization after canonicalizing `fmul X, 0.0`
into a copysign.
I decide to implement this fold in InstCombine because
`computeKnownFPClass` may be expensive.
Alive2: https://alive2.llvm.org/ce/z/ASM8tQ
This fixes the case where we would shrink an frem to half and then
bitcast to bfloat, producing invalid results. The transformation was
written under the assumption that there is only one type with a given
bit width.
Also add a strategic assert to CastInst::CreateFPCast to turn this
miscompilation into a crash.
Use KnownBits to infer the nneg flag on zext instructions.
Currently we only set nneg when converting sext -> zext, but don't set
it when we have a zext in the first place. If we want to use it in
optimizations, we should make sure the flag inference is consistent.
An issue arose when handling shift amounts while performing
narrowed funnel shifts simplification. Specifically, shift
amounts were incorrectly truncated when their type was
narrower than the target bit width. This has been addressed
by zero-extending `ShAmt` in such cases.
Fixes: https://github.com/llvm/llvm-project/issues/71463.
Proof: https://alive2.llvm.org/ce/z/5draKz.
`and` is generally more supported so if we have a `ptrmask` anyways
might as well use `and`.
Differential Revision: https://reviews.llvm.org/D156640Closes#67166
The m_ZExtOrSExt / m_Trunc in the following code can match constant
expressions, which we don't want here. Make sure we bail out early
for non-immediate constants.
Builds on #67982 which recently introduced the nneg flag on a zext
instruction. InstCombine is one of our largest canonicalizers of zext
from non-negative sext instructions, so set the flag there.
This regression triggers after commit f400daa to fix infinite loop
issue.
In this case, we can known the shift count is 0, so it will not be
triggered by the form of (iN (~X) u>> (N - 1)) in commit 21d3871, of
which N indicates the data type bitwidth of X.
Fixes https://github.com/llvm/llvm-project/issues/68465.
As per my proposal for how to eliminate debug intrinsics [0], for various
places in InstCombine prefer to insert using an instruction iterator rather
than an instruction pointer. This is so that we can eventually pass more
information in the iterator class. These call-sites where I've changed the
spelling are those that necessary to build a stage2clang to produce an
identical binary in the coming no-debug-intrinsics mode.
[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939
Differential Revision: https://reviews.llvm.org/D152543
Replace these with IRBuilder uses, as we don't (from a type
perspective) care about Constant results.
Switch the predicate to m_ImmConstant() instead of isa<Constant>
to guarantee that these do get folded away and our assumptions
about simplifications hold true.