test_tree.py: Warm cache run spends 1/3rd of its time setting up scan kernels
From a warm cache run of test_tree.py
:
>>> s.sort_stats("cumtime").print_stats("scan", 10)
Fri Jun 9 22:37:44 2017 test_tree.prof
55694928 function calls (53486051 primitive calls) in 267.293 seconds
Ordered by: cumulative time
List reduced from 13272 to 47 due to restriction <'scan'>
List reduced from 47 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
288 0.018 0.000 88.514 0.307 /home/matt/src/env-3.4/lib/python3.4/site-packages/pyopencl-2017.1.1-py3.4-linux-x86_64.egg/pyopencl/scan.py:873(__init__)
288 0.043 0.000 88.463 0.307 /home/matt/src/env-3.4/lib/python3.4/site-packages/pyopencl-2017.1.1-py3.4-linux-x86_64.egg/pyopencl/scan.py:1060(finish_setup)
576 0.034 0.000 64.130 0.111 /home/matt/src/env-3.4/lib/python3.4/site-packages/pyopencl-2017.1.1-py3.4-linux-x86_64.egg/pyopencl/scan.py:1251(build_scan_kernel)
864 0.015 0.000 26.962 0.031 /home/matt/src/env-3.4/lib/python3.4/site-packages/pyopencl-2017.1.1-py3.4-linux-x86_64.egg/pyopencl/scan.py:831(_make_template)
51 0.002 0.000 16.272 0.319 /home/matt/src/env-3.4/lib/python3.4/site-packages/pyopencl-2017.1.1-py3.4-linux-x86_64.egg/pyopencl/scan.py:1617(build_inner)
39 0.001 0.000 11.490 0.295 /home/matt/src/env-3.4/lib/python3.4/site-packages/pyopencl-2017.1.1-py3.4-linux-x86_64.egg/pyopencl/algorithm.py:816(get_scan_kernel)
1126 0.095 0.000 5.101 0.005 /home/matt/src/env-3.4/lib/python3.4/site-packages/pyopencl-2017.1.1-py3.4-linux-x86_64.egg/pyopencl/scan.py:1300(__call__)
822528 0.375 0.000 0.496 0.000 /home/matt/src/env-3.4/lib/python3.4/site-packages/pyopencl-2017.1.1-py3.4-linux-x86_64.egg/pyopencl/scan.py:834(replace_id)
5 0.000 0.000 0.064 0.013 <decorator-gen-56>:1(_make_sort_scan_type)
1 0.000 0.000 0.064 0.064 /home/matt/src/env-3.4/lib/python3.4/site-packages/pyopencl-2017.1.1-py3.4-linux-x86_64.egg/pyopencl/algorithm.py:286(_make_sort_scan_type)
This also seems to have a non-trivial impact on runtime in pytential and sumpy right now.