Skip to content

test_tree.py: Warm cache run spends 1/3rd of its time setting up scan kernels

From a warm cache run of test_tree.py:

>>> s.sort_stats("cumtime").print_stats("scan", 10)
Fri Jun  9 22:37:44 2017    test_tree.prof

         55694928 function calls (53486051 primitive calls) in 267.293 seconds

   Ordered by: cumulative time
   List reduced from 13272 to 47 due to restriction <'scan'>
   List reduced from 47 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      288    0.018    0.000   88.514    0.307 /home/matt/src/env-3.4/lib/python3.4/site-packages/pyopencl-2017.1.1-py3.4-linux-x86_64.egg/pyopencl/scan.py:873(__init__)
      288    0.043    0.000   88.463    0.307 /home/matt/src/env-3.4/lib/python3.4/site-packages/pyopencl-2017.1.1-py3.4-linux-x86_64.egg/pyopencl/scan.py:1060(finish_setup)
      576    0.034    0.000   64.130    0.111 /home/matt/src/env-3.4/lib/python3.4/site-packages/pyopencl-2017.1.1-py3.4-linux-x86_64.egg/pyopencl/scan.py:1251(build_scan_kernel)
      864    0.015    0.000   26.962    0.031 /home/matt/src/env-3.4/lib/python3.4/site-packages/pyopencl-2017.1.1-py3.4-linux-x86_64.egg/pyopencl/scan.py:831(_make_template)
       51    0.002    0.000   16.272    0.319 /home/matt/src/env-3.4/lib/python3.4/site-packages/pyopencl-2017.1.1-py3.4-linux-x86_64.egg/pyopencl/scan.py:1617(build_inner)
       39    0.001    0.000   11.490    0.295 /home/matt/src/env-3.4/lib/python3.4/site-packages/pyopencl-2017.1.1-py3.4-linux-x86_64.egg/pyopencl/algorithm.py:816(get_scan_kernel)
     1126    0.095    0.000    5.101    0.005 /home/matt/src/env-3.4/lib/python3.4/site-packages/pyopencl-2017.1.1-py3.4-linux-x86_64.egg/pyopencl/scan.py:1300(__call__)
   822528    0.375    0.000    0.496    0.000 /home/matt/src/env-3.4/lib/python3.4/site-packages/pyopencl-2017.1.1-py3.4-linux-x86_64.egg/pyopencl/scan.py:834(replace_id)
        5    0.000    0.000    0.064    0.013 <decorator-gen-56>:1(_make_sort_scan_type)
        1    0.000    0.000    0.064    0.064 /home/matt/src/env-3.4/lib/python3.4/site-packages/pyopencl-2017.1.1-py3.4-linux-x86_64.egg/pyopencl/algorithm.py:286(_make_sort_scan_type)

This also seems to have a non-trivial impact on runtime in pytential and sumpy right now.