Extended scheduler to support more vector operations, added benchmark.
Scheduler overhead is currently in the 150ns range including kernel launch overheads. Supported operations are only x {+,0, }= y; and x = y +- z;
Loading
Please register or sign in to comment