The MVETRUNC operation can perform the same truncate of two vectors, without
requiring lane inserts/extracts from every vector lane. This moves the concat
i1 lowering to use it for v8i1 and v16i1 result types, trading a bit of extra
stack space for less instructions.