3D P2P could use some constant folding
Following up to #73 (closed), there's one pow
in the computation of the scaling constant at the beginning of the kernel that gets generated for the same case as described in #73 (closed). perf top -z -a -p PID
reveals that that is pointlessly eating at least 14% CPU. (where I'm guessing that pow
is implement as exp(a*log(b))
)
67.81% p2p_from_csr.so [.] _pocl_launcher_p2p_from_csr
7.21% libm-2.24.so [.] __ieee754_ilogb
7.04% libm-2.24.so [.] __ldexp
5.55% libm-2.24.so [.] __scalbn
3.70% libm-2.24.so [.] __ilogb
2.91% libc-2.24.so [.] __isinf
2.40% p2p_from_csr.so [.] ilogb@plt
1.14% p2p_from_csr.so [.] isnan@plt
1.08% p2p_from_csr.so [.] ldexp@plt
0.72% p2p_from_csr.so [.] isinf@plt
0.35% libc-2.24.so [.] __isnan
cc @mattwala
Edited by Andreas Klöckner