
Summary: We need `malloc` to return a larger size now that it's aligned properly and we use a bunch of threads. Also the `match_any` test was wrong because it assumed a 32-bit lanemask.
Summary: We need `malloc` to return a larger size now that it's aligned properly and we use a bunch of threads. Also the `match_any` test was wrong because it assumed a 32-bit lanemask.