When using PatternMatch, there is a common problem where we want to both
match something against a pattern, but also capture the
value/instruction for various reasons (e.g. to access flags).
Currently the two ways to do that is to either capture using
m_Value/m_Instruction and do a separate match on the result, or to use
the somewhat awkward `m_CombineAnd(m_XYZ, m_Value(V))` pattern.
This PR introduces to add a variant of `m_Value`/`m_Instruction` which
does both a capture and a match. `m_Value(V, m_XYZ)` is basically
equivalent to `m_CombineAnd(m_XYZ, m_Value(V))`.
I've ported two InstCombine files to this pattern as a sample.
The motivation of this pattern is to check whether the product of a
variable and a constant would be mathematically (i.e., as integer
numbers instead of bit vectors) greater than a given constant bound. The
pattern appears to occur when compiling several Rust projects (it seems
to originate from the `smallvec` crate but I have not checked this
further).
Unless `c1` is `0`, we can transform this pattern into `x > c2/c1` with
all operations working on unsigned integers. Due to undefined behavior
when an element of a non-splat vector is `0`, the transform is only
implemented for scalars and splat vectors.
Alive proof: https://alive2.llvm.org/ce/z/LawTkmCloses#142674
The canonical pattern for bitmasked mul is currently
```
%val = and %x, %bitMask // where %bitMask is some constant
%cmp = icmp eq %val, 0
%sel = select %cmp, 0, %C // where %C is some constant = C' * %bitMask
```
In certain cases, where we are combining multiple of these bitmasked
muls with common factors, we are able to optimize into and->mul (see
https://github.com/llvm/llvm-project/pull/135274 )
This optimization lends itself to further optimizations. This PR
addresses one of such optimizations.
In cases where we have
`or-disjoint ( mul(and (X, C1), D) , mul (and (X, C2), D))`
we can combine into
`mul( and (X, (C1 + C2)), D) `
provided C1 and C2 are disjoint.
Generalized proof: https://alive2.llvm.org/ce/z/MQYMui
Consider the following case:
```
define i1 @src(i8 %x) {
%cmp = icmp slt i8 %x, -1
%not1 = xor i1 %cmp, true
%or = or i1 %cmp, %not1
%not2 = xor i1 %or, true
ret i1 %not2
}
```
`sinkNotIntoLogicalOp(%or)` calls `freelyInvert(%cmp,
/*IgnoredUser=*/%or)` first. However, as `%cmp` is also used by `Op1 =
%not1`, the RHS of `%or` is set to `%cmp.not = xor i1 %cmp, true`. Thus
`Op1` is out of date in the second call to `freelyInvert`. Similarly,
the second call may change `Op0`. According to the analysis above, I
decided to avoid this fold when one of the operands is also a user of
the other.
Closes https://github.com/llvm/llvm-project/issues/142518.
Having a finite Depth (or recursion limit) for computeKnownBits is very
limiting, but is currently a load-bearing necessity, as all KnownBits
are recomputed on each call and there is no caching. As a prerequisite
for an effort to remove the recursion limit altogether, either using a
clever caching technique, or writing a easily-invalidable KnownBits
analysis, make the Depth argument in APIs in ValueTracking uniformly the
last argument with a default value. This would aid in removing the
argument when the time comes, as many callers that currently pass 0
explicitly are now updated to omit the argument altogether.
While and->cmp->sel combines into and->mul may result in worse code on
some targets, this combine should be uniformly beneficial.
Proof: https://alive2.llvm.org/ce/z/MibAcN
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
Co-authored-by: Yingwei Zheng <dtcxzyw@qq.com>
add `GenericFloatingPointPredicateUtils` in order to generalize
effects of floating point comparisons on `KnownFPClass` for both IR and
MIR.
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
Optimize
`or disjoint (zext/sext a) (zext/sext b))`
to
`(zext/sext (or disjoint a, b))`
without losing disjoint.
Confirmed here: https://alive2.llvm.org/ce/z/kQ5fJv.
This refactoring is for more easily adding the code to preserve disjoint
or in the PR https://github.com/llvm/llvm-project/pull/136815.
Both casts must have one use for folding logicop and sext/zext when the
src type differ to avoid creating an extra instruction. If the src type
of casts are the same, only one of the casts needs to have one use. This
PR also adds more tests for the same src type.
This patch disables the fold for logical is_finite test (i.e., `and
(fcmp ord x, 0), (fcmp u* x, inf) -> fcmp o* x, inf`).
It is still possible to allow this fold for several logical cases (e.g.,
`stripSignOnlyFPOps(RHS0)` does not strip any operations). Since this
patch has no real-world impact, I decided to disable this fold for all
logical cases.
Alive2: https://alive2.llvm.org/ce/z/aH4LC7
Closes https://github.com/llvm/llvm-project/issues/136650.
Minor tweak to #129363 which handled all the cases where there was a sext for the original source value, but not for cases where the source is already half the size of the destination type
Another regression noticed in #76524
Alive2: https://alive2.llvm.org/ce/z/6zLAYp
Note: We can also apply this fix to the logic below (`if (Mask &
AMask_NotAllOnes)`), but it seems unreachable.
With the introduction of CmpPredicate in 51a895a (IR: introduce struct
with CmpInst::Predicate and samesign), PatternMatch is one of the first
key pieces of infrastructure that must be updated to match a CmpInst
respecting samesign information. Implement this change to Cmp-matchers.
This is a preparatory step in migrating the codebase over to
CmpPredicate. Since we no functional changes are desired at this stage,
we have chosen not to migrate CmpPredicate::operator==(CmpPredicate)
calls to use CmpPredicate::getMatching(), as that would have visible
impact on tests that are not yet written: instead, we call
CmpPredicate::operator==(Predicate), preserving the old behavior, while
also inserting a few FIXME comments for follow-ups.
We currently support simple reassociation for foldAndOrOfICmps().
Support the same for foldLogicOfFCmps() by going through the common
foldBooleanAndOr() helper.
This will also resolve the regression on #112704, which is also due to
missing reassoc support.
I had to adjust one fold to add support for FMF flag preservation,
otherwise there would be test regressions. There is a separate fold
(reassociateFCmps) handling reassociation for *just* that specific case
and it preserves FMF. Unfortunately it's not rendered entirely redundant
by this patch, because it handles one more level of reassociation as
well.
Currently when InstCombineAndOrXor recognizes a bswap idiom and replaces
it with an intrinsic and other instructions, only the last instruction
gets the DebugLoc of the replaced instruction set to it. This patch
applies the DebugLoc to all the generated instructions, to maintain some
degree of attribution.
Add a helper for shared folds between logical and bitwise and/or
and move the and/or of icmp and fcmp folds in there. This makes
it easier to extend to more folds.
A possible extension would be to base the current and/or of icmp
reassociation logic on this helper, so that it for example also
applies to fcmp.
In
5dbfca30c1
we assume that RHS is poison implies LHS is also poison. It doesn't hold
after introducing samesign flag.
This patch drops the `samesign` flag on RHS if the original expression
is a logical and/or.
Closes#112467.