This PR simplifies `__bitset::__init` into a more compact and readable
form, which avoids redundant computations of a `size_t` value and
eliminates the overhead of a local array.
This patch optimizes `bitset::to_string` by replacing the existing bit-by-bit processing with a more efficient
bit traversal strategy. Instead of checking each bit sequentially, we leverage `std::__countr_zero` to efficiently
locate the next set bit, skipping over consecutive zero bits. This greatly accelerates the conversion process,
especially for sparse `bitset`s where zero bits dominate. To ensure similar improvements for dense `bitset`s, we
exploit symmetry by inverting the bit pattern, allowing us to apply the same optimized traversal technique. Even
for uniformly distributed `bitset`s, the proposed approach offers measurable performance gains over the existing
implementation.
Benchmarks demonstrate substantial improvements, achieving up to 13.5x speedup for sparse `bitset`s with
`Pr(true bit) = 0.1`, 16.1x for dense `bitset`s with `Pr(true bit) = 0.9`, and 8.3x for uniformly distributed
`bitset`s with `Pr(true bit) = 0.5)`.