Skip to content

push iname to innermost

This is somewhat related to #129 and !245 , as I'm trying to avoid tagging with ilp.seq

Consider this nested loop:

knl = lp.make_kernel([
        "{ [i]: 0 <= i < 10}",
        "{ [j]: 0 <= j < 20 }",
        "{ [elem]: 0 <= elem < 4}"
    ],
    """
    for elem
        for i
            a[i*4 + elem] = 1 {id = i}
        end
        for j
            b[j*4 + elem] = 1 {id = j}
        end
    end
    """,
    target=lp.CTarget()
)
knl = lp.add_and_infer_dtypes(knl, {"a": np.dtype(np.float64)})

I want to make elem loop innermost, I can do this simply with knl = lp.tag_inames(knl, {"elem": "ilp.seq"})

void loopy_kernel(double *__restrict__ a, int *__restrict__ b)
{
  for (int j = 0; j <= 19; ++j)
    for (int elem = 0; elem <= 3; ++elem)
      b[4 * j + elem] = 1;
  for (int i = 0; i <= 9; ++i)
    for (int elem = 0; elem <= 3; ++elem)
      a[4 * i + elem] = 1.0;
}

But I'm not sure how to achieve this with prioritize_loops, e.g. knl = lp.prioritize_loops(knl, "i, j, elem")

void loopy_kernel(double *__restrict__ a, int *__restrict__ b)
{
  for (int elem = 0; elem <= 3; ++elem)
  {
    for (int i = 0; i <= 9; ++i)
      a[4 * i + elem] = 1.0;
    for (int j = 0; j <= 19; ++j)
      b[4 * j + elem] = 1;
  }
}

Do I need to write the kernel differently, with elem1, elem2 separately for i and j loop, for instance, or maybe with duplicate_inames()?

Many thanks,

-TJ