Relands #114356. Compared to the last version, this patch only merges
poison-generating/nsz flags from the select to fix LV regression in
`llvm/test/Transforms/PhaseOrdering/AArch64/predicated-reduction.ll`.
We can turn:
```
%add = add i8 %arg, C1
%and = and i8 %add, C2
%cmp = icmp eq i1 %and, C3
```
into:
```
%and = and i8 %arg, C2
%cmp = icmp eq i1 %and, (C3 - C1) & C2
```
This is only worth doing if the sequence is the sole user of the addition
operation.
We left some easy opportunities for further simplifications.
log2(trunc(x)) is simply trunc(log2(x)). This is safe if we know that
trunc is NUW because it means that the truncation didn't drop any bits.
It is also safe if the caller is OK with zero as a possible answer.
log2(x >>u y) is simply `log2(x) - y`.
log2(x & y) is a funny one. It comes up when doing something like:
```
unsigned int f(unsigned int x, unsigned int y) {
unsigned char a = 1u << x;
return y / a;
}
```
LLVM would canonicalize this to:
```
%shl = shl nuw i32 1, %x
%conv1 = and i32 %shl, 255
%div = udiv i32 %y, %conv1
```
In cases like these, we can ignore the mask entirely.
This is equivalent to `y >> x`.
Type::isScalableTy and StructType::containsScalableVectorType failed to
detect some cases of structs containing scalable vectors because
containsScalableVectorType did not call back into isScalableTy to check
the element types. Fix this, which requires sharing the same Visited set
in both functions. Also change the external API so that callers are
never required to pass in a Visited set, and normalize the naming to
isScalableTy.
In a variety of places we change the bitwidth of a parameter but don't
update the attributes.
The issue in this case is from the `range` attribute when inlining
`__memset_chk`. `optimizeMemSetChk` will replace an `i32` with an
`i8`, and if the `i32` had a `range` attr assosiated it will cause an
error.
Fixes#112633
Add a helper for shared folds between logical and bitwise and/or
and move the and/or of icmp and fcmp folds in there. This makes
it easier to extend to more folds.
A possible extension would be to base the current and/or of icmp
reassociation logic on this helper, so that it for example also
applies to fcmp.
InstCombinerImpl::foldSelectInstWithICmp has some inlined code for
select-icmp-xor simplification, but this simplification is already done
by other code, via another path:
(X & Y) == 0 ? X : X ^ Y ->
((X & Y) == 0 ? 0 : Y) ^ X ->
(X & Y) ^ X ->
X & ~Y
Cover the cases that it claims to simplify, and demonstrate that
stripping it doesn't cause test changes.
In
5dbfca30c1
we assume that RHS is poison implies LHS is also poison. It doesn't hold
after introducing samesign flag.
This patch drops the `samesign` flag on RHS if the original expression
is a logical and/or.
Closes#112467.
Today, InstCombine can fold fcmp+select patterns to minnum/maxnum
intrinsics when the nnan and nsz flags are set. The ordering of the
operands in both the fcmp and select instructions is important for the
folding to occur.
maxnum patterns:
1. (a op b) ? a : b -> maxnum(a, b), where op is one of {ogt, oge}
2. (a op b) ? b : a -> maxnum(a, b), where op is one of {ule, ult}
The second pattern is supposed to make the order of the operands in the
select instruction irrelevant. However, the pattern matching code uses
the CmpInst::getInversePredicate method to invert the comparison
predicate. This method doesn't take into account the fast-math flags,
which can lead missing the folding opportunity.
The patch extends the pattern matching code to handle unordered fcmp
instructions. This allows the folding to occur even when the select
instruction has the operands in the inverse order.
New maxnum patterns:
1. (a op b) ? a : b -> maxnum(a, b), where op is one of {ugt, uge}
2. (a op b) ? b : a -> maxnum(a, b), where op is one of {ole, olt}
The same changes are applied to the minnum intrinsic.
foldSelectEquivalence currently doesn't support GVN-like replacements on
vector types. Put in the checks for potentially lane-crossing
operations, and lift the limitation.
Factor out and unify common code from InstSimplify and InstCombine that
partially guard against cross-lane vector operations into
llvm::isNotCrossLaneOperation in ValueTracking.
Alive2 proofs for changed tests: https://alive2.llvm.org/ce/z/68H4ka
Rename the function to reflect its correct behavior and to be consistent
with `Module::getOrInsertFunction`. This is also in preparation of
adding a new `Intrinsic::getDeclaration` that will have behavior similar
to `Module::getFunction` (i.e, just lookup, no creation).
This is another small but hopefully not performance negative step to
canonicalizing towards i8 geps. We looks for geps with a constant offset
base pointer of the form `gep (gep @glob, C1), x, C2` and expand the gep
instruction, so that the constant can hopefully be combined together (or
the x offset can be computed in common).
The disables folding for FP aggregates that are not poison/posZero
types, which is currently not supported. Note: To fully handle this
aggregates would also likely require teaching `computeKnownFPClass()` to
handle array and struct constants (which does not seem implemented
outside of zero init).
Extend decomposeBitTestICmp() to handle cases where the resulting
comparison is of the form `icmp (X & Mask) pred C` with non-zero
`C`. Add a flag to allow code to opt-in to this behavior and use it in
the "log op of icmp" fold infrastructure.
This addresses regressions from #97289.
Proofs: https://alive2.llvm.org/ce/z/hUhdbU
When InstCombine replaces an old instruction with a new instruction, it
copies !dbg and !annotation metadata from old to new. For some
InstCombine patterns we set a specific DILocation on the new instruction
prior to insertion, however, which more accurately reflects the new
instruction. This more specific DILocation may be overwritten on
insertion by a less appropriate one, resulting in a less correct line
mapping. This patch changes this behaviour to only copy the DILocation
from old to new if the new instruction has no existing DILocation (which
will always be the case for a new instruction unless InstCombine has
specifically set one).
This corresponds to the canonicalized form of some logic that was
seen in Swift-generated code for comparing optional pointers:
`(X==Z || Y==Z) ? (X==Z && Y==Z) : X==Y --> X==Y`
where `Z` was the constant `0`.
https://alive2.llvm.org/ce/z/J_3aa9
Alive2: https://alive2.llvm.org/ce/z/NJgBPL
The motivating case of this patch is to emit `andn` on RISC-V with zbb
for expressions like `(sub 63, X) & 63`.
When combining two geps into one by adding the offsets, we have
to take some care when intersecting the flags, because nusw flags
cannot be straightforwardly preserved.
Add a helper for this on GEPNoWrapFlags so we won't have to repeat
this logic in various places.
There was a discrepancy between how SimplifyDemandedBits and
computeKnownBits handled the Argument case. computeKnownBits()
would use information from range attributes even once the
recursion limit has been reached.
Fixes https://github.com/llvm/llvm-project/issues/110631.
This patch moves the existing trunc+extractlement -> extractelement+bitcast fold into a foldVecExtTruncToExtElt helper and extends the helper to handle trunc+lshr+extractelement cases as well.
Fixes#107404
The srem case of SimplifyDemandedUseBits partially duplicates
KnownBits::srem. It is guarded by a statement that takes the absolute
value of the RHS and checks whether it is a power of 2, but the abs()
call here useless, since an srem with a negative RHS is flipped into one
with a positive RHS, adjusting LHS appropriately. Stripping the abs call
allows us to call KnownBits::srem instead of partially duplicating it.
When dividing by -1 we were breaking out of the code entirely,
while we should fall through to computeKnownBits().
This fixes an instcombine-verify-known-bits discrepancy.
Fixes https://github.com/llvm/llvm-project/issues/109957.
decomposeBitTestICmp() currently returns the result via two out
parameters plus an in-place modification of Pred. This changes it to
return an optional struct instead.
The motivation here is twofold. First, I'd like to extend this code to
handle cases where the comparison is against a value other than zero,
which would mean yet another out parameter. Second, while doing that I
was badly bitten by the in-place modification, so I'd like to get rid of
it.
InstCombine already has some rules for `icmp ptrtoint, ptrtoint` to drop
the casts and compare the source values. This change adds the same for
the reverse case with `inttoptr`.