Skip to content

Prefetching through indirect accesses

Consider:

import loopy as lp
knl = lp.make_kernel("{[i, j, k]: 0 <= i,k < 10 and 0<=j<2}",
"""
for i, j, k
    a[map1[indirect[i], j], k] = 2
end
""",
[lp.GlobalArg("a", strides=(2, 1), dtype=int),
 lp.GlobalArg("map1", shape=(10,10), dtype=int),
 lp.GlobalArg("indirect", shape=(10, ), dtype=int)], target=lp.CTarget())

knl = lp.prioritize_loops(knl, "i,j,k")

Now I'd like to prefetch indirect into the i loop (this one probably the compiler would do for me anyway I guess):

knl = lp.add_prefetch(knl, "indirect[i]")

Which gives code like:

void loopy_kernel(long *__restrict__ a, long const *__restrict__ map1, long const *__restrict__ indirect)
{
  long indirect_fetch;

  for (int i = 0; i <= 9; ++i)
  {
    indirect_fetch = indirect[i];
    for (int j = 0; j <= 1; ++j)
      for (int k = 0; k <= 9; ++k)
        a[2 * map1[10 * indirect_fetch + j] + k] = 2;
  }
}

Which is fine. But now I'd like to prefetch map1 out of the k loop:

So that I get:

void loopy_kernel(long *__restrict__ a, long const *__restrict__ map1, long const *__restrict__ indirect)
{
  long indirect_fetch;
  long map1_fetch;
  for (int i = 0; i <= 9; ++i)
  {
    indirect_fetch = indirect[i];
    for (int j = 0; j <= 1; ++j)
      map1_fetch = map1[10 * indirect_fetch + j];
      for (int k = 0; k <= 9; ++k)
        a[2 * map1_fetch + k] = 2;
  }
}

This one doesn't work:

knl = lp.add_prefetch(knl, "map1[:, j]")
data/lmitche1/src/firedrake/src/loopy/loopy/kernel/__init__.py in get_leaf_domain_indices(self, inames)
    584 
    585         for iname in inames:
--> 586             home_domain_index = hdm[iname]
    587             if home_domain_index in domain_indices:
    588                 # nothin' new

KeyError: 'indirect_fetch'
> /data/lmitche1/src/firedrake/src/loopy/loopy/kernel/__init__.py(586)get_leaf_domain_indices()
    584 
    585         for iname in inames:
--> 586             home_domain_index = hdm[iname]
    587             if home_domain_index in domain_indices:
    588                 # nothin' new

I guess the problem is that these indirect accesses don't define inames, so it's a little tricky to figure things out?