Skip to content

Codegen much slower due to do_access_ranges_overlap_conservative()

Here is a profile of a recent cold cache run of some pytential code. This code builds and runs an order 20 FMM with the Laplace kernel.

Thu Mar  8 14:00:44 2018    out.prof

         475425154 function calls (447088610 primitive calls) in 390.339 seconds

   Ordered by: cumulative time
   List reduced from 10357 to 30 due to restriction <30>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   1669/1    0.092    0.000  390.582  390.582 {built-in method builtins.exec}
        1    0.000    0.000  390.582  390.582 test.py:1(<module>)
        1    0.000    0.000  390.104  390.104 test.py:107(test_qbx_cauchy_integral)
        2    0.001    0.000  390.097  195.048 /Users/matt/src/conformal-map-paper/code/qbx.py:28(qbx_cauchy_integral)
6849590/3644    2.566    0.000  389.342    0.107 /Users/matt/miniconda3/envs/inteq/lib/python3.6/site-packages/pytools/__init__.py:569(wrapper)
    57/25    0.000    0.000  384.512   15.380 /Users/matt/src/pytential/pytential/symbolic/execution.py:303(__call__)
    57/25    0.003    0.000  384.512   15.380 /Users/matt/src/pytential/pytential/symbolic/compiler.py:365(execute)
        6    0.000    0.000  383.562   63.927 /Users/matt/src/pytential/pytential/qbx/__init__.py:492(exec_compute_potential_insn)
        6    0.001    0.000  383.557   63.926 /Users/matt/src/pytential/pytential/qbx/__init__.py:556(exec_compute_potential_insn_fmm)
        6    0.002    0.000  379.547   63.258 /Users/matt/src/pytential/pytential/qbx/fmm.py:360(drive_fmm)
      445    0.005    0.000  286.728    0.644 /Users/matt/src/loopy/loopy/kernel/__init__.py:1262(__call__)
      445    0.005    0.000  286.549    0.644 /Users/matt/src/loopy/loopy/target/pyopencl_execution.py:314(__call__)
       65    0.005    0.000  286.332    4.405 /Users/matt/src/loopy/loopy/target/pyopencl_execution.py:273(kernel_info)
5203139/712804    7.307    0.000  237.544    0.000 /Users/matt/miniconda3/envs/inteq/lib/python3.6/site-packages/pymbolic/mapper/__init__.py:114(__call__)
       65    0.001    0.000  221.180    3.403 /Users/matt/src/loopy/loopy/target/execution.py:762(get_typed_and_scheduled_kernel)
       25    0.026    0.001  211.100    8.444 /Users/matt/src/loopy/loopy/target/execution.py:728(get_typed_and_scheduled_kernel_uncached)
       25    0.001    0.000  184.095    7.364 /Users/matt/src/loopy/loopy/schedule/__init__.py:2051(get_one_scheduled_kernel)
       25    0.000    0.000  179.949    7.198 /Users/matt/src/loopy/loopy/schedule/__init__.py:2038(_get_one_scheduled_kernel_inner)
       50    0.001    0.000  179.949    3.599 /Users/matt/src/loopy/loopy/schedule/__init__.py:1837(generate_loop_schedules)
180399/180352    0.049    0.000  177.455    0.001 {built-in method builtins.next}
       50    0.005    0.000  177.282    3.546 /Users/matt/src/loopy/loopy/schedule/__init__.py:1854(generate_loop_schedules_inner)
       25    0.001    0.000  168.311    6.732 /Users/matt/src/loopy/loopy/check.py:595(pre_schedule_checks)
    22001    2.010    0.000  164.969    0.007 /Users/matt/src/loopy/loopy/symbolic.py:1583(get_access_range)
25163/25121    0.258    0.000  164.552    0.007 /Users/matt/src/loopy/loopy/symbolic.py:1697(map_subscript)
       25    0.005    0.000  162.958    6.518 /Users/matt/src/loopy/loopy/check.py:560(check_variable_access_ordered)
       25    0.233    0.009  162.953    6.518 /Users/matt/src/loopy/loopy/check.py:446(_check_variable_access_ordered_inner)
     8202    0.054    0.000  158.409    0.019 /Users/matt/src/loopy/loopy/symbolic.py:1806(do_access_ranges_overlap_conservative)
    52278    0.066    0.000  157.768    0.003 /Users/matt/src/loopy/loopy/symbolic.py:1753(__call__)
    16404    0.343    0.000  157.614    0.010 /Users/matt/src/loopy/loopy/symbolic.py:1769(_get_access_range_conservative)
       24    0.001    0.000   93.877    3.912 /Users/matt/src/sumpy/sumpy/tools.py:366(get_cached_optimized_kernel)

In particular, 157 seconds is spent in /Users/matt/src/loopy/loopy/symbolic.py:1753, which is part of AccessRangeMapper.