See D153611. Tests for the cost of icmp(and, 0) are added, in addition to
expanding the extractelements-to-shuffle.ll test, which has always been a bit
simple, to include a more complete example with both a vector and scalar
version. The icmp(and, 0) costs are targetting at improving the second when the
cost of vector inserts and extracts is lowered.