Skip to content

Kernel printing: hard to tell where loops start and end

E.g. when printing out test_rob_stroud_bernstein_full one gets the following. From the output, it's hard to tell at a glance that every instruction belongs to el_inner:

---------------------------------------------------------------------------
KERNEL: loopy_kernel
---------------------------------------------------------------------------
ARGUMENTS:
coeffs: GlobalArg, type: <runtime>, shape: unknown
nels: ValueArg, type: <runtime>
qpts: GlobalArg, type: <runtime>, shape: (2, 7), dim_tags: (N1:stride:7, N0:stride:1)
result: GlobalArg, type: <runtime>, shape: (nels, 7, 7), dim_tags: (N2:stride:49, N1:stride:7, N0:stride:1)
---------------------------------------------------------------------------
DOMAINS:
[nels] -> { [i2, alpha1, alpha2, i1_2, alpha1_2, i2_2, el_inner, el_outer_outer, el_outer_inner] : 0 <= i2 <= 6 and 0 <= alpha1 <= 4 and 0 <= alpha2 <= 4 - alpha1 and 0 <= i1_2 <= 6 and 0 <= alpha1_2 <= 4 and 0 <= i2_2 <= 6 and 0 <= el_inner <= 15 and 0 <= el_outer_inner <= 1 and -el_inner - 32el_outer_outer <= 16el_outer_inner < nels - el_inner - 32el_outer_outer }
---------------------------------------------------------------------------
INAME IMPLEMENTATION TAGS:
alpha1: unr
alpha1_2: None
alpha2: unr
el_inner: l.0
el_outer_inner: ilp.unr
el_outer_outer: g.0
i1_2: None
i2: l.1
i2_2: None
---------------------------------------------------------------------------
TEMPORARIES:
aind: type: <auto>, shape: () scope:auto
r: type: <auto>, shape: () scope:auto
r2: type: <auto>, shape: () scope:auto
s: type: <auto>, shape: () scope:auto
s2: type: <auto>, shape: () scope:auto
tmp: type: <auto>, shape: (5, 7), dim_tags: (N1:stride:7, N0:stride:1) scope:auto
w: type: <auto>, shape: () scope:auto
w2: type: <auto>, shape: () scope:auto
xi: type: <auto>, shape: () scope:auto
xi2: type: <auto>, shape: () scope:auto
---------------------------------------------------------------------------
INSTRUCTIONS:
↱↱     [el_inner,el_outer_inner,el_outer_outer,i2]
││                                          xi <- qpts[1, i2]   # insn
└│↱↱   [el_inner,el_outer_inner,el_outer_outer,i2]
 │││                                        s <- 1 + (-1)*xi   # insn_0
↱└└│   [el_inner,el_outer_inner,el_outer_outer,i2]
│  │                                        r <- xi / s   # insn_1
│↱ │   [el_inner,el_outer_inner,el_outer_outer,i2]
││ │                                        aind <- 0   # aind_init
││↱└   [alpha1,el_inner,el_outer_inner,el_outer_outer,i2]
│││                                         w <- s**(4 + (-1)*alpha1)   # init_w
│││↱↱↱ [alpha1,el_inner,el_outer_inner,el_outer_outer,i2]
││││││                                      tmp[alpha1, i2] <- tmp[alpha1, i2] + w*coeffs[aind]   # write_tmp
└│└└││↱[alpha1,alpha2,el_inner,el_outer_inner,el_outer_outer,i2]
 │  │││                                     w <- w*r*(4 + (-1)*alpha1 + (-1)*alpha2) / (1 + alpha2)   # update_w
↱└  └│└[alpha1,alpha2,el_inner,el_outer_inner,el_outer_outer,i2]
│    │                                      aind <- aind + 1   # aind_incr
└↱↱  │ [el_inner,el_outer_inner,el_outer_outer,i1_2]
 ││  │                                      xi2 <- qpts[0, i1_2]   # insn_2
↱└│↱ │ [el_inner,el_outer_inner,el_outer_outer,i1_2]
│ ││ │                                      s2 <- 1 + (-1)*xi2   # insn_3
└↱└│ │ [el_inner,el_outer_inner,el_outer_outer,i1_2]
 │ │ │                                      r2 <- xi2 / s2   # insn_4
 │ └ │ [el_inner,el_outer_inner,el_outer_outer,i1_2]
 │   │                                      w2 <- s2**4   # insn_5
 │   └ [alpha1_2,el_inner,el_outer_inner,el_outer_outer,i1_2,i2_2]
 │                                          result[el_inner + (el_outer_inner + el_outer_outer*2)*16, i1_2, i2_2] <- result[el_inner + (el_outer_inner + el_outer_outer*2)*16, i1_2, i2_2] + w2*tmp[alpha1_2, i2_2]   # insn_6
 └     [alpha1_2,el_inner,el_outer_inner,el_outer_outer,i1_2]
                                            w2 <- w2*r2*(4 + (-1)*alpha1_2) / (1 + alpha1_2)   # insn_7
---------------------------------------------------------------------------

There are two ways of going about this (that I can think of):

  1. infer as much of a nesting order as we can at this stage, and always print out inames in the nesting order
  2. use a different color for each iname in the list

I am inclined to go with (1), although there is still the potential confusion that if there are multiple nesting orders the scheduler may pick a different one. Perhaps a different visual indicator to indicate that the nesting order is ambiguous, such as surrounding equivalently-nested inames in {}?