This patch adds recognition of high-half multiply by parts into a single
larger multiply.
Considering a multiply made up of high and low parts, we can split the
multiply into:
x * y == (xh*T + xl) * (yh*T + yl)
where `xh == x>>32` and `xl == x & 0xffffffff`. `T = 2^32`.
This expands to
xh*yh*T*T + xh*yl*T + xl*yh*T + xl*yl
which I find it helpful to be drawn as
[ xh*yh ]
[ xh*yl ]
[ xl*yh ]
[ xl*yl ]
We are looking for the "high" half, which is xh*yh + xh*yl>>32 + xl*yh>>32 +
carrys. The carry makes this difficult and there are multiple ways of
representing it. The ones we attempt to support here are:
Carry: xh*yh + carry + lowsum
carry = lowsum < xh*yl ? 0x1000000 : 0
lowsum = xh*yl + xl*yh + (xl*yl>>32)
Ladder: xh*yh + c2>>32 + c3>>32
c2 = xh*yl + (xl*yl >> 32); c3 = c2&0xffffffff + xl*yh
Carry4: xh*yh + carry + crosssum>>32 + (xl*yl + crosssum&0xffffffff) >> 32
crosssum = xh*yl + xl*yh
carry = crosssum < xh*yl ? 0x1000000 : 0
Ladder4: xh*yh + (xl*yh)>>32 + (xh*yl)>>32 + low>>32;
low = (xl*yl)>>32 + (xl*yh)&0xffffffff + (xh*yl)&0xfffffff
They all start by matching `xh*yh` + 2 or 3 other operands. The bottom of the
tree is `xh*yh`, `xh*yl`, `xl*yh` and `xl*yl`.
Based on #156879 by @c-rhodes