P2P (List 1) is not actually core-parallel
As of f76a30ee (with sumpy@a15072cb), running examples/layerpot-3d.py
will not actually make use of all cores in the list 1 (P2P) evaluation. (I'm observing this using htop on pocl 0.13 on stout.) Other kernels seem to parallelize fine. The generated OpenCL looks like it does the right thing.
(The log makes it look like List 2 is happening, but perf top -z
reveals that it's in __pocl_launcher_p2p_from_csr
.)
cc @mattwala