now custom access pattern knl can be produced given a requested stride along...
now custom access pattern knl can be produced given a requested stride along with the requested subgroup size, data size, buswidth, and bps; sets ldim0, ldim1 to create desired pattern with 2d copy if possible
Loading