This benchmark isn't very good at benchmarking `search_n`, since a good `search_n` implementation can go through it in ~10 perfectly predictable steps. We can drop it to avoid spending unnecessary resources. This also fixes that the two benchmark sets have identical names. Fixes #183832