This PR implements a generic backend-agnostic parallel `std::is_sorted` based on `std::transform_reduce`. While this approach is suboptimal comparing a direct backend-specific implementation, since it doesn't support early termination and requires a reduction operation, it does show speedup when the dataset is large enough and the comparator is not absolutely trivial. Parent issue: #99938