Motivated by the need from IREE side to support ArgMax/ArgMin-like operation using dpp and ballot operation (refer to https://github.com/iree-org/iree/discussions/23609#discussioncomment-16311655 for more details), this PR adds `gpu.ballot` operation to the MLIR GPU dialect with ROCDL, NVVM, and SPIR-V lowering support. Assisted-by: [Claude Code](https://claude.ai/code) --------- Signed-off-by: Bangtian Liu <liubangtian@gmail.com>