Updated benchmark to use equivalent work loads for cpu and gpu.
* Numpy does element-wise operations by default. Updated the cpu operation to use pure numpy. * Eliminated the loop which is not necessary to demonstrate parallelism on array operations. * Made the number of workers explicit rather than gpu chosen, through local_size variable passed to kernel execution. * Increased to ~8 million data points to more clearly demonstrate the difference between cpu and gpu based computations.
Loading
Please register or sign in to comment