Introduce StackIdToIndex to ModuleSummaryIndexBitcodeReader to cache the
mapping from module-local stack id indices to the global index in the
ModuleSummaryIndex's StackIds vector. This avoids repeated hash lookups
when processing callsite and allocation records.
This reduced the thin link time for a large target built with memprof
by ~16%.
Also add assertions to ensure STACK_IDS records are processed once and
that the cache is empty initially.