Skip to content
Commit aadb5b72 authored by Karl Rupp's avatar Karl Rupp
Browse files

Scan: Refurbished CUDA and OpenCL implementations.

Now uses only three kernels and one temporary buffer rather than the
previous approach with four kernels and two temporary vectors(!).
Also prepared explicit API for inplace-scans.

Possible further optimizations:
 - Non-inplace scans can run without temporary buffer
 - Small vectors can run with only one kernel invocation, no temporary buffer
 - Test suite for scans needs more love.
parent 525fc3ae
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment