MIC: Enhanced kernel parameters for BLAS levels 1, 2, 3.
Parameters not obtained from a full-fledged optimizer run, but from careful manual tweaking. Obtained memory bandwidths for BLAS levels 1 and 2 are about 70 GB/sec, which is okay given that vector operations on the Xeon Phi are slow with OpenCL. BLAS level 3 improves very mildly, peaks at about 40 GFLOP/sec. Given that OpenCL for Xeon Phi (KNC) has limited use and that everybody is eager for KNL, further tuning efforts are suspended. Resolves #26.
Loading
Please register or sign in to comment