Prefetching through indirect accesses
Consider:
import loopy as lp
knl = lp.make_kernel("{[i, j, k]: 0 <= i,k < 10 and 0<=j<2}",
"""
for i, j, k
a[map1[indirect[i], j], k] = 2
end
""",
[lp.GlobalArg("a", strides=(2, 1), dtype=int),
lp.GlobalArg("map1", shape=(10,10), dtype=int),
lp.GlobalArg("indirect", shape=(10, ), dtype=int)], target=lp.CTarget())
knl = lp.prioritize_loops(knl, "i,j,k")
Now I'd like to prefetch indirect
into the i
loop (this one probably the compiler would do for me anyway I guess):
knl = lp.add_prefetch(knl, "indirect[i]")
Which gives code like:
void loopy_kernel(long *__restrict__ a, long const *__restrict__ map1, long const *__restrict__ indirect)
{
long indirect_fetch;
for (int i = 0; i <= 9; ++i)
{
indirect_fetch = indirect[i];
for (int j = 0; j <= 1; ++j)
for (int k = 0; k <= 9; ++k)
a[2 * map1[10 * indirect_fetch + j] + k] = 2;
}
}
Which is fine. But now I'd like to prefetch map1
out of the k
loop:
So that I get:
void loopy_kernel(long *__restrict__ a, long const *__restrict__ map1, long const *__restrict__ indirect)
{
long indirect_fetch;
long map1_fetch;
for (int i = 0; i <= 9; ++i)
{
indirect_fetch = indirect[i];
for (int j = 0; j <= 1; ++j)
map1_fetch = map1[10 * indirect_fetch + j];
for (int k = 0; k <= 9; ++k)
a[2 * map1_fetch + k] = 2;
}
}
This one doesn't work:
knl = lp.add_prefetch(knl, "map1[:, j]")
data/lmitche1/src/firedrake/src/loopy/loopy/kernel/__init__.py in get_leaf_domain_indices(self, inames)
584
585 for iname in inames:
--> 586 home_domain_index = hdm[iname]
587 if home_domain_index in domain_indices:
588 # nothin' new
KeyError: 'indirect_fetch'
> /data/lmitche1/src/firedrake/src/loopy/loopy/kernel/__init__.py(586)get_leaf_domain_indices()
584
585 for iname in inames:
--> 586 home_domain_index = hdm[iname]
587 if home_domain_index in domain_indices:
588 # nothin' new
I guess the problem is that these indirect accesses don't define inames, so it's a little tricky to figure things out?