Calibrate the performance model
For a number of reasons, including load balance and FMM balancing, it would be good to have calibrated performance models (i.e. not just ones that reflect asymptotic cost). An easy way to get that would be to slap timing-data-collecting decorators on the wrangler methods and divide those with counting data from the performance model.
What I have in mind here is some static numbers in the code that are intended to be 'roughly accurate' (per-backend, and perhaps per-architecture) and that can be fed into the perf model.