Rework block generator
This is meant to redo some of the block generator stuff so that it stores all the blocks in a huge linear array. It's still a work in progress, but I wanted to post it because I'm not sure I'm doing the scans and setup properly.
It's just for P2P at the moment and it basically does the following (in P2PMatrixBlockGenerator.__call__
):
- Creates a little kernel that computes a cumsum of block sizes.
- Creates a little kernel that computes a set of
(rowindices, colindices)
that are meant to map from the linear index to the row and column in the full matrix. - Uses the row and column indices to fill in the
results
array with all the block matrix elements. This is a bit nicer now because we can split a big iname and scale better (hopefully).
What I was wondering:
- Is there a cleaner way to construct the cumulative
(rowindices, colindices)
? I couldn't think of any.. - Should I merge the first two loopy kernels? I separated them because the first one basically computes the size of rowindices and colindices and I couldn't think of another way to tell loopy what size they're supposed to be.
Edited by Alexandru Fikl