Having a finite Depth (or recursion limit) for computeKnownBits is very
limiting, but is currently a load-bearing necessity, as all KnownBits
are recomputed on each call and there is no caching. As a prerequisite
for an effort to remove the recursion limit altogether, either using a
clever caching technique, or writing a easily-invalidable KnownBits
analysis, make the Depth argument in APIs in ValueTracking uniformly the
last argument with a default value. This would aid in removing the
argument when the time comes, as many callers that currently pass 0
explicitly are now updated to omit the argument altogether.
For code maintainability -- this may result in cases where we are
applying the optimization where it is not profitable, but those are
likely to be rare.
It is known that for vector whose element fits in i16 will be split and
scalarized in SelectionDag's type legalizer
(see SIISelLowering::getPreferredVectorAction).
LRO attempts to undo the scalarizing of vectors across basic block
boundary and shoehorn Values in VGPRs. LRO is beneficial for operations
that natively work on illegal vector types to prevent flip-flopping
between unpacked and packed. If we know that operations on vector will
be split and scalarized, then we don't want to shoehorn them back to
packed VGPR.
Operations that we know to work natively on illegal vector types usually
come in the form of intrinsics (MFMA, DOT8), buffer store, shuffle, phi
nodes to name a few.
It's possible that we are unable to coerce all the incoming values of a
PHINode (A). Thus, we are unable to coerce the PHINode. In this
situation, we previously would add the PHINode back to the ValMap. This
would cause a problem is PhiNode (B) was a user of A. In this scenario,
if B has been coerced, we would hit an assert regarding the incompatible
type between the PHINode and its incoming value.
Deleting non-coerced PHINodes from the map, and propagating the removal
to users, resolves the issue.