GEMM: Substantially improved pure CPU-based implementation.
About a factor of 20 faster than previous implementation. I estimate that more microtuning can get another factor of 2. Higher performance gains will most likely require intrinsics.
Loading
Please register or sign in to comment