The code for `memcpy` is the same as in #148204 but it fixes the build
bot error by using `static_assert(cpp::always_false<decltype(access)>)`
instead of `static_assert(false)` (older compilers fails on
`static_assert(false)` in `constexpr` `else` bodies).
The code for `memset` is new and vastly improves performance over the
current byte per byte implementation.
Both `memset` and `memcpy` implementations use prefetching for sizes >=
64. This lowers a bit the performance for sizes between 64 and 256 but
improves throughput for greater sizes.