Updated scheduler benchmark to reflect a more involved vector operation.
x = y + z replaced by x = alpha * y + beta * z, which involves more dispatching. Overhead still below 0.5us on my Desktop machine, hence small compared to kernel launch overhead.
Loading
Please register or sign in to comment