If both the character and string are known, but the length
potentially isn't, we can optimize the memchr() call to a select
of either the known position of the character or null.
Split off from https://reviews.llvm.org/D122836.
If the memchr() size is 1, then we can convert the call into a
single-byte comparison. This works even if both the string and the
character are unknown.
Split off from https://reviews.llvm.org/D122836.
This change was previously reverted because I forgot rerunning
update_test_checks.py and tests were not actually baseline.
Extracted from: https://reviews.llvm.org/D122757
This is a hacky fix for:
https://github.com/llvm/llvm-project/issues/54558
As discussed there, codegen regressed when we opened up this transform
to allow extra uses ( 61580d0949fd3465 ), and it's not clear how to
undo the transforms at the later stage of compilation.
As noted in the code comments, there's a set of remaining folds that
are still limited to one-use, so we can try harder to refine and
expand the limitations on these folds, but it's likely to be an
up-and-down battle as we find and overcome similar regressions.
Differential Revision: https://reviews.llvm.org/D122909
This is a retry of 9397bdc67eb2 - that was reverted until
we had a clang warning in place to alert users about a
possible mistake in source. The warning was added with
ab982eace6e4.
This is noted as a missing clang warning in #54222,
but it is also a missing optimization opportunity.
Alive2 proofs:
https://alive2.llvm.org/ce/z/Q8drDqhttps://alive2.llvm.org/ce/z/pE6LRt
I don't see a single conversion for all predicates
using "getFCmpCode" logic, so other predicates are
left as a TODO item.
These two are equivalent,
and i *think* the `and` form is more-ish canonical.
General proof: https://alive2.llvm.org/ce/z/RrF5s6
If constant on the (outer) `xor` is an `undef`,
the whole lane is dead: https://alive2.llvm.org/ce/z/mu4Sh2
However, if the constant on the (inner) `or` is an `undef`,
we must sanitize it first: https://alive2.llvm.org/ce/z/MHYJL7
I guess, producing a zero `and`-mask is optimal in that case.
alive-tv is happy about the entirety of `xor-of-or.ll`.
Before we gave up if a call through bitcast had parameter attributes.
Interestingly, we allowed attributes for the return value already. We
now handle both the same way, namely, we drop the ones that are
incompatible with the new type and keep the rest. This cannot cause
"more UB" than initially present.
Differential Revision: https://reviews.llvm.org/D119967
We already do this for SelectionDAG, but we're missing it here.
Noticed while re-triaging PR21929
Differential Revision: https://reviews.llvm.org/D122340
Add the "(0 - X) --> (X * -1)" reverse identity to the list of alternate form binops.
We need a little hack to make the existing logic work because it does not expect to
move constants from op0 to op1, but the code comment hopefully makes that clear.
I don't think there are any other identities like that.
Fixes#54364
Differential Revision: https://reviews.llvm.org/D122390
These are based on tests that were included in the abandoned D122299.
Comments indicate what should or should not happen if we change
behavior in Negator.
The first attempt at this missed a validity check.
This version includes a test of the narrow source
type for modulo-16-bits.
Original commit message:
This is the IR counterpart to 370ebc9d9a573d6
which provided a bswap narrowing fix for issue #53867.
Here we can be more general (although I'm not sure yet
what would happen for illegal types in codegen - too
rare to worry about?):
https://alive2.llvm.org/ce/z/3-CPfo
This will be more effective if we have moved the shift
after the bswap as proposed in D122010, but it is
independent of that patch.
Differential Revision: https://reviews.llvm.org/D122166
This reverts commit 9e9bda2e8f5b88715bad767a4b7740df32b040d2.
This causes a backend error when building the Linux kernel for arm64.
See https://reviews.llvm.org/D122166 for a simplified reproducer.
The motivation for this is that while both memcpyopt and dse will catch this case, both are limited by MSSA's walk back threshold when finding clobbers. As such, if you have a memcpy of an otherwise dead alloca placed towards the end of a long basic block with lots of other memory instructions, it would be missed. This is a bit undesirable for such an "obviously" useless bit of code.
As noted in comments, we should probably generalize instcombine's escape analysis peephole (see visitAllocInst) to allow read xor write. Doing that would subsume this code in a more general way, but is also a more involved change. For the moment, I went with the easiest fix.
There's a potential miscompile or missed optimization with
propagating 'nsw' in the transform proposed in D122013, so
we need at least one more test for coverage.
When shifting by a byte-multiple:
bswap (shl X, C) --> lshr (bswap X), C
bswap (lshr X, C) --> shl (bswap X), C
This is an IR implementation of a transform suggested in D120648.
The "swaps cancel" test models the motivating optimization from
that proposal.
Alive2 checks (as noted in the other review, we could use
knownbits to handle shift-by-variable-amount, but that can be an
enhancement patch):
https://alive2.llvm.org/ce/z/pXUaRfhttps://alive2.llvm.org/ce/z/ZnaMLf
Differential Revision: https://reviews.llvm.org/D122010
This is the IR counterpart to 370ebc9d9a573d6
which provided a bswap narrowing fix for issue #53867.
Here we can be more general (although I'm not sure yet
what would happen for illegal types in codegen - too
rare to worry about?):
https://alive2.llvm.org/ce/z/3-CPfo
This will be more effective if we have moved the shift
after the bswap as proposed in D122010, but it is
independent of that patch.
Differential Revision: https://reviews.llvm.org/D122166
This patch tries to sink instructions when they are only used in a successor block.
This is a further enhancement patch based on Anna's commit:
D109700, which allows sinking an instruction having multiple uses in a single user.
In this patch, sink instructions with multiple users in a single successor block will be supported.
It could fix a known issue from rust:
https://github.com/rust-lang/rust/issues/51346#issuecomment-394443610
Reviewed By: nikic, reames
Differential Revision: https://reviews.llvm.org/D121585
InstCombine llvm.aarch64.sve.sel to select. This allows an existing instCombine
added in 20b0fa91c9ee to fire.
Differential Revision: https://reviews.llvm.org/D121792
Reapply with an explicit check for multi-edges, as the expected
behavior of multi-edge dominance is unclear (D120811).
-----
For conditional branches, we know the value is i1 0 or i1 1 along
the outgoing edges. For switches we can apply exactly the same
optimization, just with the known values determined by the switch
cases.
This can be viewed as swapping the select arms:
https://alive2.llvm.org/ce/z/jUvFMJ
...so we don't have the 'nsz' problem with the more general fold.
This unlocks other folds for the motivating fabs example.
This was discussed in issue #38828.
If we have a logical and/or in select form and the true/false operand
is an fcmp with poison generating FMF, we won't be able to fold it
to an and/or instruction. This prevents us from optimizing the case
where it is a logical operation of two fcmps with identical operands.
This patch adds explicit checks for this case that doesn't rely on
converting to and/or to do the optimization. It reuses the existing
foldLogicOfFCmps, but adds a new flag to disable the other combine
that is inside that function.
FMF flags from the two FCmps are intersected using the logic added in
D121243. The FIXME has been updated to indicate that we can only use
a union for the non-select form.
This allows us to optimize cases like this from compare-fp-3.c in the
gcc torture suite with fast math.
void
test1 (float x, float y)
{
if ((x==y) && (x!=y))
link_error0();
}
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D121323