Investigate loopy slowness in test_stokes
A run of the test_exterior_stokes test from a warm cache shows that it spends at least 40% of its time in cl_kernel_info:
Tue May 16 00:00:40 2017 stokes.prof
278440157 function calls (241904286 primitive calls) in 350.224 seconds
Ordered by: cumulative time
List reduced from 11125 to 40 due to restriction <40>
List reduced from 40 to 13 due to restriction <'loopy'>
ncalls tottime percall cumtime percall filename:lineno(function)
8001 0.054 0.000 158.109 0.020 /home/wala1/src/loopy/loopy/kernel/__init__.py:1455(__call__)
8001 0.069 0.000 156.974 0.020 /home/wala1/src/loopy/loopy/target/pyopencl_execution.py:704(__call__)
184 0.007 0.000 154.740 0.841 /home/wala1/src/loopy/loopy/target/pyopencl_execution.py:646(cl_kernel_info)
184 0.294 0.002 109.709 0.596 /home/wala1/src/loopy/loopy/execution.py:140(get_typed_and_scheduled_kernel)
184 0.003 0.000 39.202 0.213 /home/wala1/src/loopy/loopy/codegen/__init__.py:375(generate_code_v2)
182 0.577 0.003 38.551 0.212 /home/wala1/src/loopy/loopy/type_inference.py:463(infer_unknown_types)
184 0.003 0.000 35.959 0.195 /home/wala1/src/loopy/loopy/preprocess.py:1950(preprocess_kernel)
184 0.003 0.000 34.696 0.189 /home/wala1/src/loopy/loopy/schedule/__init__.py:1967(get_one_scheduled_kernel)
552 0.010 0.000 27.042 0.049 /home/wala1/src/loopy/loopy/kernel/__init__.py:1532(update_persistent_hash)
50354 0.602 0.000 25.026 0.000 /home/wala1/src/loopy/loopy/type_inference.py:385(_infer_var_type)
176662 1.599 0.000 21.812 0.000 /home/wala1/src/loopy/loopy/kernel/instruction.py:816(update_persistent_hash)
552 0.001 0.000 20.856 0.038 /home/wala1/src/loopy/loopy/kernel/__init__.py:1570(__ne__)
552 0.213 0.000 20.855 0.038 /home/wala1/src/loopy/loopy/kernel/__init__.py:1548(__eq__)
I don't fully trust these numbers because the test appears to have failed when running in CUDA, but this indicates something worth investigating.