Unified operations of the form v1 = v2 @ alpha +- v2 @ beta, with vectors v1,...
Unified operations of the form v1 = v2 @ alpha +- v2 @ beta, with vectors v1, v2, v3, @ denoting either product or division, and alpha being either a CPU or a GPU scalar. Only four kernels are in use for that now (was 8 - also kills a bunch of temporaries we had to accept/circumvent previously).
Loading
Please register or sign in to comment