This is very likely the cause of a stage 2 failure in
Transforms/LoopVectorize/check-prof-info.ll. Revert until I can
investigate this.
This reverts commit 3d199d086e076f0b9b90d4c59f2226a417a639b5.
Support replacement of operands not only in the immediate
instruction, but also instructions it uses.
To the most part, this extension is straightforward, but there are
two bits worth highlighting:
First, we can now no longer assume that if the Op is a vector, the
instruction also returns a vector. If Op is a vector and the
instruction returns a scalar, we should consider it as a cross-lane
operation.
Second, for the x ^ x special case, we can no longer assume that
the operand is RepOp, as we might have a replacement higher up the
instruction chain.
There is one optimization regression, but it is in a fuzzer-generated
test case.
Fixes https://github.com/llvm/llvm-project/issues/63104.
After the semantics change from https://reviews.llvm.org/D154051,
gep inbounds x, 0 can no longer produce poison. As such, we can
also perform this fold during non-refining operand replacement
and avoid unnecessary drops of the inbounds flag.
The online alive2 version has not been update to the new
semantics yet, but we can use the following proof locally:
define ptr @src(ptr %base, i64 %offset) {
%cmp = icmp eq i64 %offset, 0
%gep = getelementptr inbounds i8, ptr %base, i64 %offset
%sel = select i1 %cmp, ptr %base, ptr %gep
ret ptr %sel
}
define ptr @tgt(ptr %base, i64 %offset) {
%gep = getelementptr inbounds i8, ptr %base, i64 %offset
ret ptr %gep
}
With the semantics change from D154051, it is no longer valid to
fold gep inbounds undef to poison (unless we know the index is
non-zero). Fold it to undef instead.
Differential Revision: https://reviews.llvm.org/D154215
Strengthen the fold for icmps of non-overlapping storage, by
working on the difference of offsets, rather than considering
both offsets independently. In particular, this allows handling
comparisons of pointers to the end of equal-sized allocations.
Proofs: https://alive2.llvm.org/ce/z/Po2nL4
Differential Revision: https://reviews.llvm.org/D153752
D143505 fixed/simplified folding of operations with SNaN operands. In
doing so it introduced a crash when handling scalable vector types,
wherein the scalable-vector ConstantVector was cast to a ConstantFP.
Since we know by that point in the code that if we've found a NaN, we're
dealing with a scalable-vector splat (as there are no other kinds of
scalable-vector constant for which that holds), we can grab the splatted
value and re-use the existing code, which will automatically splat the
new NaN back to a scalable vector for us.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D153566
The way this is currently implemented the accumulated offsets can
end up having a different size, which causes unnecessary
complication for further extension of the code.
Don't strip pointer casts at the start and rely on
stripAndAccumulate to do any necessary stripping. It gracefully
handles different index sizes and will always retain the width of
the original pointer index type.
This is not NFC, but unlikely to make any practical difference.
This reverts commit 935c8b6f3a4dda0ff881ed86faaad9fe5b276d70.
Going to fix forward size regression instead due to more dependent patches needing to be reverted otherwise.
This reverts commit 0c03f48480f69b854f86d31235425b5cb71ac921.
Going to fix forward size regression instead due to more dependent patches needing to be reverted otherwise.
`select i1 non-const, i1 true, i1 false` has been optimized to
`non-const`. There is no reason that we can not optimize `select i1
ConstExpr, i1 true, i1 false` to `ConstExpr`.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D151631
Unlike every other analysis and transform, simplifyInstruction
permitted operating on instructions which are not inserted
into a function. This created an edge case no other code needs
to really worry about, and limited transforms in cases that
can make use of the context function. Only the inliner and a handful
of other utilities were making use of this, so just fix up these
edge cases. Results in some IR ordering differences since
cloned blocks are inserted eagerly now. Plus some additional
simplifications trigger (e.g. some add 0s now folded out that
previously didn't).
This reverts commit 0012b94a4e8e0c757ef0adcd68fb61bb0318b26c.
Reverting due to test failures introduced by 73925ef8b0eacc6792f0e3ea21a3e6d51f5ee8b0
Updated floating-point-compare.ll to keep the assume declaration.
This is generally handled already in early CSE.
If a specialized pipeline is used, however, its possible for `i1`
operand with known-zero denominator to slip through. Generally the
known-zero denominator is caught and poison is returned, but if it is
indirect enough (known zero through a phi node) we can miss this case
in `InstructionSimplify` and then miss handling `i1`. This is because
`i1` is current handled with the following check:
`if(Known.countMinLeadingZeros() == Known.getBitWidth() - 1)`
which only works on the assumption we don't know the denominator to be
zero. If we know the denominator to be zero, this check fails:
https://github.com/llvm/llvm-project/issues/62607
This patch simply adds an explicit `if(Known.isZero) return poison;`
which fixes the issue.
Alive2 Link for tests:
https://alive2.llvm.org/ce/z/VTw54n
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D150142
Following the change in shufflevector semantics,
poison will be used to represent undefined elements in shufflevector masks.
Differential Revision: https://reviews.llvm.org/D149256
Use simplifySelectWithICmpEq to handle the implied equalities from the icmp-and,
then both of ICMP_NE and ICMP_EQ will be addressed including vector type.
(X & Y) == -1 ? X : -1 --> -1 (commuted 2 ways)
(X & Y) != -1 ? -1 : X --> -1 (commuted 2 ways)
This is a supplement to the icmp-or scenario on D148986.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D149229
Use simplifySelectWithICmpEq to handle the implied equalities from the icmp-or,
then both of ICMP_NE and ICMP_EQ will be addressed
(X | Y) == 0 ? X : 0 --> 0 (commuted 2 ways)
(X | Y) != 0 ? 0 : X --> 0 (commuted 2 ways)
Fixes https://github.com/llvm/llvm-project/issues/62263
Reviewed By: nikic, RKSimon
Differential Revision: https://reviews.llvm.org/D148986
Relative to the previous attempt, this includes a bailout for phi
nodes, whose arguments might refer to a previous cycle iteration.
We did not hit this before by a fortunate deficiency of the
ConstantFoldInstOperands() API, which doesn't handle phi nodes,
unlike ConstantFoldInstruction().
-----
Instead of hardcoding a few instruction kinds, use the generic
interface now that we have it.
The primary effect of this is that intrinsics are now supported.
It's worth noting that this is still limited in that it does not
support vectors, so we can't remove e.g. existing fshl special
cases.
Instead of hardcoding a few instruction kinds, use the generic
interface now that we have it.
The primary effect of this is that intrinsics are now supported.
It's worth noting that this is still limited in that it does not
support vectors, so we can't remove e.g. existing fshl special
cases.
The threading folds are the same for div/rem and the isDivZero()
fold only differes in the return value.
This should be NFC, but as this slightly shuffles around the
order of the folds it might not be exactly the same.