Documentation Notes ^^^^^^^^^^^^^^^^^^^ - Need to clarify fundamental difference between constants baked into code and things that remain variable. (ISL parameters, symbolic shapes) Things to consider ^^^^^^^^^^^^^^^^^^ - Depedencies are pointwise for shared loop dimensions and global over non-shared ones (between dependent and ancestor) - multiple insns could fight over which iname gets local axis 0 -> complicated optimization problem - Every loop in loopy is opened at most once. Too restrictive? - Loop bounds currently may not depend on parallel dimensions Does it make sense to relax this? - Why do CSEs necessarily have to duplicate the inames? -> because that would be necessary for a sequential prefetch - Cannot do slab decomposition on inames that share a tag with other inames -> Is that reasonable? - Parallel dimension splitting/merging via tags -> unnecessary? - Not using all hw loop dimensions causes an error, as is the case for variant 3 in the rank_one test. - Measure efficiency of corner cases To-do ^^^^^ - Just touching a variable written to by a non-idempotent instruction makes that instruction also not idempotent - assert dependencies <= parent_inames in loopy/__init__.py ??? - user interface for dim length prescription - Deal with equality constraints. (These arise, e.g., when partitioning a loop of length 16 into 16s.) Future ideas ^^^^^^^^^^^^ - Better for loop bound generation -> Try a triangular loop - Sharing of checks across ILP instances - Eliminate the first (pre-)barrier in a loop. - Generate automatic test against sequential code. - Automatically verify that all array access is within bounds. - Reason about generated code, give user feedback on potential improvements. - Convolutions, Stencils - DMA engine threads? - Divisibility, modulo, strides? - Try, fix indirect addressing - variable shuffle detection - Use gists (why do disjoint sets arise?) - Nested slab decomposition (in conjunction with conditional hoisting) could generate nested conditional code. Dealt with ^^^^^^^^^^ - Give a good error message if a parameter assignment in get_problems() is missing. - Slab decomposition for ILP -> I don't think that's possible. - It is hard to understand error messages that referred to instructions that are generated during preprocessing. -> Expose preprocessing to the user so she can inspect the preprocessed kernel. - Which variables need to be duplicated for ILP? -> Only reduction - implemented_domain may end up being smaller than requested in cse evaluations--check that! - Allow prioritization of loops in scheduling. - Make axpy better. - Screwy lower bounds in slab decomposition - reimplement add_prefetch - Flag, exploit idempotence - Some things involving CSEs might be impossible to schedule a[i,j] = cse(b[i]) * cse(c[j]) - Be smarter about automatic local axis choice -> What if we run out of axes? - Implement condition hoisting (needed, e.g., by slab decomposition) - Check for non-use of hardware axes - Slab decomposition for parallel dimensions - implement at the outermost nesting level regardless - bound *all* tagged inames - can't slab inames that share tags with other inames (for now) - Make syntax for iname dependencies - make syntax for insn dependencies - Implement get_problems() - CSE iname duplication might be unnecessary? (don't think so: It might be desired to do a full fetch before a mxm k loop even if that requires going iterative.) - Reduction needs to know a neutral element - Types of reduction variables? - Generalize reduction to be over multiple variables Should a dependency on an iname be forced in a CSE? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Local var: l | n g | y dl | Err d | Err Private var: l | y g | y dl | Err d | Err dg: Invalid-> error d: is duplicate l: is tagged as local idx g: is tagged as group idx Raise error if dl is targeting a private variable, regardless of whether it's a dependency or not.