Use bucket sort to schedule comm batches in distributed-memory
This avoids quadratic runtime in the previous "batchy toposort".
Co-authored-by: Andreas Kloeckner <inform@tiker.net>
This avoids quadratic runtime in the previous "batchy toposort".
Co-authored-by: Andreas Kloeckner <inform@tiker.net>