Skip to content

Sequential loop bounds generation does not obey projection semantics

    knl = lp.make_kernel(                                                                                                                                     
        "{[i, loc1, loc2]: 0 <= loc1 <= 1 and 0 <= loc2 <= 2"                                                                                                 
        " and 0 <= i <= loc1 and 0 <= i <= loc2}",                                                                                                            
        """                                                                                                                                                   
        <>tmp[loc2] = 0                                                                                                                                       
                                                                                                                                                              
        for i                                                                                                                                                 
          tmp[i] = 1 {inames=i:loc2}                                                                                                                          
        end                                                                                                                                                   
                                                                                                                                                              
        out[loc2] = tmp[loc2]                                                                                                                                 
        """,                                                                                                                                                  
        "...",                                                                                                                                                
        seq_dependencies=True) 

    knl = lp.tag_inames(knl, dict(loc1="l.0", loc2="l.0"))                                                                                                    
    knl = lp.set_temporary_scope(knl, "tmp", "local")                                                                                                         

This kernel returns [1, 1, 0] but I would expect it to return [1, 1, 1]. The root cause appears to be that the loop bounds for i are too tight - they rely on both loc1 and loc2. I think they should be using only loc2, to be consistent with the projection semantics of loopy.

(Part of) the fix is that get_usable_inames_for_conditional() should use only the common set of parallel inames in the block. Right now it returns both loc1 and loc2.

(Possibly) related: #68 !107 #64

Edited by Matt Wala