fixes#184877
This change was threefold.
1. copy the padded cbuffer from memory to a local alloca
2. switch to using the new `getFlattenedIndex` helpers for index
generation
3. convert row major to column major indicies in codegen depending on
LangOptions