Streaming prefetch in add_prefetch/precompute
Adds a "streaming" option to add_prefetch
and precompute
, which modifies precompute
to copy any overlapping local data between iterations of the outer loop of a split iname, i.e. to avoid unnecessary redundant global reads. At the moment this manifests as an optional stream_iname
parameter (specifying the outer iname) to add_prefetch
and precompute
which, when not None
, implements the streaming version. (There are a number of spots in my implementation that possibly could/should be improved.)
Edited by Andreas Klöckner