Faster 3D M2Ls using precomputed rotation matrices
This change uses the mplocquadu2_trunc
version of the M2L translation operators from fmmlib. These operators use a precomputed rotation matrix to speed up the rotations for the "point and shoot" translation.
To obtain the rotation matrix, the traversal gets taught to compute the rotation angles that are necessary for the M2L translation. This is done by recording the translation vectors when List 2 is built, and then computing the "rotation class" for each translation vector.
The FMMLIB wrangler gains a new optional "geometry data" parameter which supplies the rotation classes for List 2. The wrangler uses this information to precompute rotation matrices when doing the M2L. There is a memory cutoff threshold beyond which we revert to the regular version as the matrices can get quite large.
Some timing results on my laptop, taken from the test test_fmm_with_optimized_3d_m2l
:
Laplace, 10^4 sources and 10^4 targets:
Order 10:
Baseline M2L time : 6.351 s
Optimized M2L time: 3.327 s
Order 20:
Baseline M2L time : 34.12 s
Optimized M2L time: 19.19 s
Helmholtz, 10^4 sources and 10^4 targets:
Order 10:
Baseline M2L time : 22.36 s
Optimized M2L time: 19.63 s
Order 20:
Baseline M2L time : 142 s
Optimized M2L time: 130.9 s
-
Point requirements.txt
back to pyfmmlib master after pyfmmlib!13 (merged) is merged -
Run pytential against this, make sure CI passes (pytential!152 (merged), pytential!161 (closed))