There are some intrinsics are using i16 vectors in place of bfloat
vectors.
Move towards making bf16 vectors legal so these can migrate. Leave the
larger vectors for a later change.
Depends #76213#76214
Fix bitcast test, which was splitting apart phis intended to force
bitcasts that survive all the way to selection.
Disable the amdgpu-codegenprepare phi splitting, which defeats the technique
of using a phi to ensure a bitcast reaches all the way to selection. Also
add a variety of bfloat tests. These probably need revisiting to avoid the
cast folding into argument loads. Also round out set of bfloat bitcast and
ABI tests.
Add codegen tests for more bf16 operations The promotion of these works
contrary to the comment.
Surprisingly these were getting legalized to something
zero initialized.
This fixes an infinite loop when combining some vector types.
Also fixes zero initializing some undef values.
SimplifyDemandedVectorElts / SimplifyDemandedBits are not checking
for the legality of the output undefs they are replacing unused
operations with. This resulted in turning vectors into undefs
that were later re-legalized back into zero vectors.