Asymptotically better algorithm for M2M (In both compressed and full)
On master, M2M is O(p^2d) for order p and dimension d, but CSE
reduces this down to O(p^{2d-1}) sometimes.
For eg: in Helmholtz 2D, full taylor is O(p^{2d-1}) and
HelmholtzConformingTaylor is O(p^{2d-1.5}).
This commit produces expressions in O(p^{2d-1}) consistently
regardless of how good CSE is and O(p^{2d-2}) for compressed.
The new algorithm uses the observation that M2M coefficients have the form in 2D,
B_{m, n} = \sum_{i\le m, j\le n} A_{i, j} d_x^i d_y^j \binom{m}{i} \binom{n}{j}
and can be rewritten as follows,
Let T_{m, n} = \sum_{i\le m} A_{i, n} d_x^i \binom{m}{i}.
Then, B_{m, n} = \sum_{j\le n} T_{m, j} d_y^j \binom{n}{j}
and T_{m, n} are p^2 number of temporary variables that are
reused for different M2M coefficients and costs p per variable.
Total cost for calculating T_{m, n} is p^3 and similar for B_{m, n}