simple custom access pattern example now predicts tiled matmul fairly well...
simple custom access pattern example now predicts tiled matmul fairly well using only custom mem access kernels
simple custom access pattern example now predicts tiled matmul fairly well using only custom mem access kernels