SimplifyDemandedFPClass isn't properly adjusting the IRBuilder
insert point, so this could insert at the wrong point if the
simplification happens in one of the recursive calls. There are a few
more of these to fix.
Start trying to use SimplifyDemandedFPClass on instructions, starting
with fmul. This subsumes the old transform on multiply of 0. The
main change is the introduction of nnan/ninf. I do not think anywhere
was systematically trying to introduce fast math flags before, though
a few odd transforms would set them.
Previously we only called SimplifyDemandedFPClass on function returns
with nofpclass annotations. Start following the pattern of
SimplifyDemandedBits, where this will be called from relevant root
instructions.
I was wondering if this should go into InstCombineAggressive, but that
apparently does not make use of InstCombineInternal's worklist.
Previously, `SimplifyDemandedUseBits` for `add` instructions only
used known zeros from the RHS to simplify the LHS. It failed to
handle the symmetric case where the LHS has known zeros and the
result does not demand the low bits.
This patch implements this missing optimization, allowing the RHS
constant to be shrunk when the LHS low bits are known zero and unused.
Proof: https://alive2.llvm.org/ce/z/6v9iFY
Fixed: https://github.com/llvm/llvm-project/issues/135411
In SimplifyDemandedFPClass, stop using nsz when there's a
mismatch in the sign of 0 for the various min and maxes.
Alive2 doesn't like it: https://alive2.llvm.org/ce/z/ZyhSGA,
presumably because of the possible mismatch between the stored
value and the propagated. Maybe it would be OK if nsz is on all
the uses.
Match the multi-use case's logic for understanding no-nan/no-inf
context.
Also only apply the nsz handling in the single use case. alive2 seems to
treat nsz as nondeterministic for each use.
This fixes missed flag inference in some cases, due to not inferring
no-nan result implies no-nan source. Also start treating explicit
nofpclass
attributes as a leaf value, like a constant or argument.
As part of the profcheck effort we are trying to explicitly annotate
select instructions where we cannot reasonably synthesize profile
information as having an unknown profile. This does that for the case
introduced in 0993d69bc35cfdd4f3a904a603701e66906e8987.
Clean up some now redundant propagation of known-result to known-source
cases. Also move the application of the demanded mask to individual
cases, since the intermediate results are often used.
When reporting the known class result, apply the demanded mask to
filter out rejected cases. This can simplify known-source checks
further up the call stack. There are a few improved test diffs. This
does not yet try to clean up now redundant result checks.
Do an initial brute-force scope_exit to ensure these are cleared.
Later we can do a better job by pushing this into the individual
instruction cases.
2b03d68398819fe3608c680d6c25aa9d5a043c03 factored this into a function
and used from the new place, but forgot to delete the old code from
the original location.
SimplifyDemandedFPClass's handling of fabs recently became smarter in
the multiple use case than single. Unify these so the single use case
is equally as smart. This includes propagating ninf / nnan context into
the instruction, and accounting for nsz if the only bit difference is
for zero.
Alive isn't particularly happy with this in the case where
one of the inputs could be zero, but I think
it's wrong: https://alive2.llvm.org/ce/z/dF7V6k
nsz shouldn't permit introducing a -0 result where
there wasn't one in the input here.
Refine handling of minimum/maximum and minimumnum/maximumnum. The
previous folds to input were based on sign bit checks. This was too
conservative with 0s. This can now consider -0 as less than or equal
to +0 as appropriate, account for nsz. It additionally can handle
cases like one half is known positive normal and the other subnormal.
Note some of the tests currently fail with alive, but not
due to this patch. Namely, when performing the fadd x, 0 -> x
simplification in functions with non-IEEE denormal handling.
The existing instsimplify ignores the denormals-are-zero hazard by
checking cannotBeNegativeZero instead of isKnownNeverLogicalZero.
Also note the self handling doesn't really do anything yet, other
than propagate consistent known-fpclass information until there is
multiple use support.
This also leaves behind the original ValueTracking support, without
switching to the new KnownFPClass:fadd utility. This will be easier
to clean up after the subsequent fsub support patch.