foldAllocaCmp() needs to fold all comparisons of an alloca at the
same time, to ensure that there is a consistent view of the alloca
address. Currently, it folds "all" comparisons by limiting to the
case where there is only one. This patch switches the algorithm to
instead actually collect and fold all comparisons.
Something we need to be careful about here is that there may be
comparisons where both sides of the icmp are based on the alloca.
Such comparisons are comparing offsets of the alloca, and as such
can be ignored here, but shouldn't be folded to false.
Differential Revision: https://reviews.llvm.org/D144492
The output of intrinsic functions like ctpop, cttz, ctlz have limited range from 0 to bitwidth. So if the truncate destination type can hold the source bitwidth size, we can just ignore the truncate and use the truncate src to do combination.
Alive2 proofs:
https://alive2.llvm.org/ce/z/9D_-qP
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D143368
is.fpclass(x, qnan|snan) -> fcmp uno x, 0.0
is.fpclass(nnan x, qnan|snan|other) -> is.fpclass(x, other)
Start porting the existing combines from llvm.amdgcn.class to the
generic intrinsic. Start with the ones which aren't dependent on the
FP mode.
There's no reason to use "CI" (cast instruction) when
we know that the value is a more specific (exact) type
of instruction (although we might want to common-ize some
of this code to eliminate duplication or logic diffs).
It's also visually difficult to distinguish between "CI",
"ICI", and "IC" acronyms (and those could change meaning
depending on context).
This was partially changed in earlier commits, so this
makes this pair of functions consistent.
Tries to perform
(lshr (add (zext X), (zext Y)), K)
-> (icmp ult (add X, Y), X)
where
- The add's operands are zexts from a K-bits integer to a bigger type.
- The add is only used by the shr, or by iK (or narrower) truncates.
- The lshr type has more than 2 bits (other types are boolean math).
- K > 1
This seems to be a pattern that just comes from OpenCL front-ends, so adding DAG/GISel combines doesn't seem to be worth the complexity.
Original patch D107552 by @abinavpp - adapted to use (a + b < a) instead of uaddo following discussion on the review.
See this issue https://github.com/RadeonOpenCompute/ROCm/issues/488
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D138814
The important bit here is that we gracefully handle other uses,
iff they can be adapted to inversion.
I'll note, the previous logic was actively bad,
it increased instruction count since it didn't actually ensure
that the inversions happened.
Matches what we do for binary operations, but a special care needs
is needed to preserve operand order, as the logical operations
are not strictly commutative!
The class in InstCombineInternal.h inherits from InstCombiner.h.
I think this split was created when target specific InstCombines
were moved to go through TTI.
I had to update some of the code in InstCombiner.h to match changes
that had been made to InstCombineInternal.h.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D140230
Currently InstCombine folds using the
`return replaceInstUsesWith(V, Builder.CreateFoo())`
pattern do not preserve the original name of the instruction.
To preserve the name, you either have to use something like
`return FooInst::Create(...)` which is usually less nice, or go
out of the way to preserve the name with takeName(). We often
don't do that.
This patch instead preserves the name in replaceInstUsesWith()
when replacing a named instruction with an unnamed instruction.
To be conservative, I also added a zero-use check, which is a
proxy for the case where the instruction was just created, rather
than an existing one reused. Possibly we could drop that part.
As InstCombine tests are robust against renames this does not
cause any test diffs, so I regenerated a random test to show the
effects.
Differential Revision: https://reviews.llvm.org/D140192
This reverts commit 492c471839a66e354ebe696bd3e15f7477c63613.
As pointed out by nloped, the transform in f2 is not correct: If
%shr is poison, then freeze may result in a negative value. The
transform is correct in the case where the freeze is pushed through
the operation in a way that guarantees the result is non-negative,
which is the case I had tested.
Move logical operators on pairs of llvm.is.fpclass on the same value
into the test mask of a single is_fpclass.
or (class x, mask0), (class x, mask1) -> class x, (mask0 | mask1)
and (class x, mask0), (class x, mask1) -> class x, (mask0 & mask1)
xor (class x, mask0), (class x, mask1) -> class x, (mask0 ^ mask1)
The and/or cases should appear frequently in the builtin math
libraries; haven't seen the xor case but handle it for completeness.
Splatting the first vector element of the result of a BinOp, where any of the
BinOp's operands are the result of a first vector element splat can be simplified to
splatting the first vector element of the result of the BinOp
Differential Revision: https://reviews.llvm.org/D135876
Splatting the first vector element of the result of a BinOp, where any of the
BinOp's operands are the result of a first vector element splat can be simplified to
splatting the first vector element of the result of the BinOp
Differential Revision: https://reviews.llvm.org/D135876
Rather than inserting a ptrtoint + inttoptr pair, directly replace
the inttoptr with the new phi node. This ensures that no other
transform can undo it before the pair gets folded away.
This avoids the infinite loop when combined with D134954.
This is NFCI in the sense that it shouldn't make a difference, but
could due to different worklist order.
We currently assume in a number of places that free-like functions
free their first argument. This is true for all hardcoded free-like
functions, but with the new attribute-based design, the freed
argument is supposed to be indicated by the allocptr attribute.
To make sure we handle this correctly once allockind(free) is
respected, add a getFreedOperand() helper which returns the freed
argument, rather than just indicating whether the call frees *some*
argument.
This migrates most but not all users of isFreeCall() to the new
API. The remaining users are a bit more tricky.
We really want to push freezes through recurrence phis, so that we
freeze only the start value, rather than the IV value on every
iteration. foldOpIntoPhi() already handles this for the case where
the transfer function doesn't produce poison, e.g.
%iv.next = add %iv, 1. However, this does not work if nowrap flags
are present, e.g. the very common %iv.next = add nuw %iv, 1 case.
This patch adds a fold that pushes freeze instructions to the start
value by checking whether all backedge values will be non-poison
after poison generating flags have been dropped. This allows pushing
freezes out of loops in most cases. I suspect that this also
obsoletes the CanonicalizeFreezeInLoops pass, and we can probably
drop it.
Fixes https://github.com/llvm/llvm-project/issues/56048.
Differential Revision: https://reviews.llvm.org/D127960
Similarly to a change recently done for fcmps, add a flag that
indicates whether the and/or is logical to foldAndOrOfICmps, and
reuse the function when folding logical and/or.
We were already calling some parts of it, but this gives us a
clearer indication of which parts may need poison-safe variants,
and would also allow to fold combinations of bitwise and logical
and/or.
This change should be close to NFC, because all folds this enables
were either already called previously, or can make use of implied
poison reasoning.
In this patch we add a function foldICmpInstWithConstantAllowUndef
to fold integer comparisons with a constant operand: icmp Pred X, C
where X is some kind of instruction and C is AllowUndef.
We move this fold to the new function, so that it can solve undef elts in a vector.
Reviewed By: spatel, RKSimon
Differential Revision: https://reviews.llvm.org/D125220
There is a long function foldICmpInstWithConstant,
we can separate a function foldICmpBinOpWithConstant from it.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D125457
If there is a freeze %x, we currently replace all other uses of %x
with freeze %x -- as long as they are dominated by the freeze
instruction. This patch extends this behavior to cases where we
did not originally dominate the use by moving the freeze
instruction directly after the definition of the frozen value.
The motivation can be seen in test @combine_and_after_freezing_uses:
Canonicalizing everything to freeze %x allows folds that are based
on value identity (i.e. same operand occurring in two places) to
trigger. This also covers the case from D125248.
Differential Revision: https://reviews.llvm.org/D125321
This is an edge-case where we don't convert to bitwise and/or based
on implies poison reasoning, so explicitly try to perform the fold
in logical form. The transform itself is poison-safe, as both icmps
are based on the same value and any nowrap flags are discarded as
part of the fold (https://alive2.llvm.org/ce/z/aCwC8b for the used
example).
Folds are supposed to always be added in conjugated pairs for and
and or. Merge the two functions to make folds for which this is
currently not the case more obvious.
By adding a parameter to function FoldOpIntoSelect, we can fold more Ops to Select.
For this example, we tend to fold the division instruction,
so we no longer care whether SelectInst is one use.
This patch slove TODO left in InstCombine/div.ll.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D122967
If we have a logical and/or in select form and the true/false operand
is an fcmp with poison generating FMF, we won't be able to fold it
to an and/or instruction. This prevents us from optimizing the case
where it is a logical operation of two fcmps with identical operands.
This patch adds explicit checks for this case that doesn't rely on
converting to and/or to do the optimization. It reuses the existing
foldLogicOfFCmps, but adds a new flag to disable the other combine
that is inside that function.
FMF flags from the two FCmps are intersected using the logic added in
D121243. The FIXME has been updated to indicate that we can only use
a union for the non-select form.
This allows us to optimize cases like this from compare-fp-3.c in the
gcc torture suite with fast math.
void
test1 (float x, float y)
{
if ((x==y) && (x!=y))
link_error0();
}
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D121323
Only call it for intrinsic min/max. The moved implementation is
unchanged apart from the one-use check: It is now hardcoded to
one-use, without the two-use special case for SPF.