This reverts commit d80d5b923c6f611590a12543bdb33e0c16044d44.
It wasn't a particularly important transform to begin with and caused
some codegen regressions on targets that prefer `sitofp` so dropping.
Might re-visit along with adding `nneg` flag to `uitofp` so its easily
reversable for the backend.
This patch enables more optimization after canonicalizing `fmul X, 0.0`
into a copysign.
I decide to implement this fold in InstCombine because
`computeKnownFPClass` may be expensive.
Alive2: https://alive2.llvm.org/ce/z/ASM8tQ
This fixes the case where we would shrink an frem to half and then
bitcast to bfloat, producing invalid results. The transformation was
written under the assumption that there is only one type with a given
bit width.
Also add a strategic assert to CastInst::CreateFPCast to turn this
miscompilation into a crash.
Use KnownBits to infer the nneg flag on zext instructions.
Currently we only set nneg when converting sext -> zext, but don't set
it when we have a zext in the first place. If we want to use it in
optimizations, we should make sure the flag inference is consistent.
An issue arose when handling shift amounts while performing
narrowed funnel shifts simplification. Specifically, shift
amounts were incorrectly truncated when their type was
narrower than the target bit width. This has been addressed
by zero-extending `ShAmt` in such cases.
Fixes: https://github.com/llvm/llvm-project/issues/71463.
Proof: https://alive2.llvm.org/ce/z/5draKz.
`and` is generally more supported so if we have a `ptrmask` anyways
might as well use `and`.
Differential Revision: https://reviews.llvm.org/D156640Closes#67166
The m_ZExtOrSExt / m_Trunc in the following code can match constant
expressions, which we don't want here. Make sure we bail out early
for non-immediate constants.
Builds on #67982 which recently introduced the nneg flag on a zext
instruction. InstCombine is one of our largest canonicalizers of zext
from non-negative sext instructions, so set the flag there.
This regression triggers after commit f400daa to fix infinite loop
issue.
In this case, we can known the shift count is 0, so it will not be
triggered by the form of (iN (~X) u>> (N - 1)) in commit 21d3871, of
which N indicates the data type bitwidth of X.
Fixes https://github.com/llvm/llvm-project/issues/68465.
As per my proposal for how to eliminate debug intrinsics [0], for various
places in InstCombine prefer to insert using an instruction iterator rather
than an instruction pointer. This is so that we can eventually pass more
information in the iterator class. These call-sites where I've changed the
spelling are those that necessary to build a stage2clang to produce an
identical binary in the coming no-debug-intrinsics mode.
[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939
Differential Revision: https://reviews.llvm.org/D152543
Replace these with IRBuilder uses, as we don't (from a type
perspective) care about Constant results.
Switch the predicate to m_ImmConstant() instead of isa<Constant>
to guarantee that these do get folded away and our assumptions
about simplifications hold true.
/data/llvm-project/llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp:32:15: error: function 'decomposeSimpleLinearExpr' is not needed and will not be emitted [-Werror,-Wunneeded-internal-declaration]
static Value *decomposeSimpleLinearExpr(Value *Val, unsigned &Scale,
^
1 error generated.
This is part of select constant expression removal. As there is
only a single place where this is used, just expand it to explicit
constant folding calls.
(Normally we'd just use the IRBuilder here, but this isn't possible
due to mergeUndefsWith use).
The reported compile-time regression has been address in
47f9109dff80a1abbe2705ee71dc0882b1d62274.
Additionally, this contains a change to immediately fold zext
with constant operand, even if it's used in a trunc. I'm not sure
if this is relevant for anything, but I noticed it as a behavioral
discrepancy when investigating this issue.
-----
InstCombine currently performs a constant folding attempt as part
of the main InstCombine loop, before visiting the instruction.
However, each visit method will also attempt to simplify the
instruction, which will in turn constant fold it. (Additionally,
we also constant fold instructions before the main InstCombine loop
and use a constant folding IR builder, so this is doubly redundant.)
There is one place where InstCombine visit methods currently don't
call into simplification, and that's casts. To be conservative,
I've added an explicit constant folding call there (though it has
no impact on tests).
This makes for a mild compile-time improvement and in particular
mitigates the compile-time regression from enabling load
simplification in be88b5814d9efce131dbc0c8e288907e2e6c89be.
Differential Revision: https://reviews.llvm.org/D144369
Increase compile time with ubsan ARM from 3 to 14 min single file.
I upload reproducer into D144369.
Also we have random timeouts on internal x86_64 builds.
Both bisected to this one.
This reverts commit 45a0b812fa13ec255cae91f974540a4d805a8d79.
The m_VScale() matcher is unusual in that it requires a DataLayout.
It is currently used to determine the size of the GEP type. However,
I believe it is sufficient to check for the canonical
<vscale x 1 x i8> form here -- I don't think there's a need to
recognize exotic variations like <vscale x 1 x i4> as a vscale
constant representation as well.
Differential Revision: https://reviews.llvm.org/D144566
InstCombine currently performs a constant folding attempt as part
of the main InstCombine loop, before visiting the instruction.
However, each visit method will also attempt to simplify the
instruction, which will in turn constant fold it. (Additionally,
we also constant fold instructions before the main InstCombine loop
and use a constant folding IR builder, so this is doubly redundant.)
There is one place where InstCombine visit methods currently don't
call into simplification, and that's casts. To be conservative,
I've added an explicit constant folding call there (though it has
no impact on tests).
This makes for a mild compile-time improvement and in particular
mitigates the compile-time regression from enabling load
simplification in be88b5814d9efce131dbc0c8e288907e2e6c89be.
Differential Revision: https://reviews.llvm.org/D144369
The LoopVectorizer emits the (scaled) element count as i32, which for
scalable VFs results in calls to @llvm.vscale.i32(). This value is scaled
and further zero-extended to i64.
The zero-extend can be folded away by executing the whole expression in i64
type using @llvm.vscale.i64(). Any logical `and` that would needed to mask
the result can be further folded away by KnownBits analysis when
vscale_range is set.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D143016
Recommitting after fixing scalable vector crash.
Check for single smax pattern against zero when converting from a
small enough float.
Differential Revision: https://reviews.llvm.org/D142481
https://alive2.llvm.org/ce/z/2iC4oB
This is similar to changes made for zext + lshr:
21d3871b7c90
6c39a3aae1dc
The existing fold did not account for extra uses, so we
see some instruction count reductions in the test diffs.
This is intended to improve analysis (icmp likely has more
transforms than any other opcode), make other transforms
more symmetric with zext/lshr, and it can be inverted
in codegen if profitable.
As with the earlier changes, there is potential to uncover
infinite combine loops, but I have not found any yet.
There's no reason to use "CI" (cast instruction) when
we know that the value is a more specific (exact) type
of instruction (although we might want to common-ize some
of this code to eliminate duplication or logic diffs).
It's also visually difficult to distinguish between "CI",
"ICI", and "IC" acronyms (and those could change meaning
depending on context).
This was partially changed in earlier commits, so this
makes this pair of functions consistent.
This bit-hack transform would cause the new test to infinite loop
after 21d3871b7c90f85b3ae.
The deleted transform has existed for a very long time,
but the profitable parts appear to be handled by other
folds now. This fold could replace 2 instructions with
4 instructions, so it was always in danger of going
overboard.
No tests regress by removing the whole thing.
In the changed tests, we avoid creating extra instructions,
and there are no obvious regressions in IR tests at least.
Codegen should be able to create the shift+mask form if that
is profitable.
This is a more general fix for issue #59897 than 0eedc9e56712 .