Documentation Notes ^^^^^^^^^^^^^^^^^^^ - Need to clarify fundamental difference between constants baked into code and things that remain variable. (ISL parameters, symbolic shapes) Things to consider ^^^^^^^^^^^^^^^^^^ - Depedencies are pointwise for shared loop dimensions and global over non-shared ones (between dependent and ancestor) - multiple insns could fight over which iname gets local axis 0 -> complicated optimization problem - Every loop in loopy is opened at most once. Too restrictive? - Loop bounds currently may not depend on parallel dimensions Does it make sense to relax this? - Why do CSEs necessarily have to duplicate the inames? -> because that would be necessary for a sequential prefetch - Cannot do slab decomposition on inames that share a tag with other inames -> Is that reasonable? - Parallel dimension splitting/merging via tags -> unnecessary? - Not using all hw loop dimensions causes an error, as is the case for variant 3 in the rank_one test. - Measure efficiency of corner cases - Loopy as a data model for implementing custom rewritings - We won't generate WAWs barrier-needing dependencies from one instruction to itself. To-do ^^^^^ - variable shuffle detection -> will need unification - Fix all tests - Automatically generate testing code vs. sequential. - Deal with equality constraints. (These arise, e.g., when partitioning a loop of length 16 into 16s.) - duplicate_dimensions can be implemented without having to muck around with individual constraints: - add_dims - move_dims - intersect Future ideas ^^^^^^^^^^^^ - Float4 joining on fetch/store? - How can one automatically generate something like microblocks? - Better for loop bound generation -> Try a triangular loop - Sharing of checks across ILP instances - Eliminate the first (pre-)barrier in a loop. - Generate automatic test against sequential code. - Automatically verify that all array access is within bounds. - Reason about generated code, give user feedback on potential improvements. - Convolutions, Stencils - DMA engine threads? - Divisibility, modulo, strides? - Try, fix indirect addressing - Use gists (why do disjoint sets arise?) - Nested slab decomposition (in conjunction with conditional hoisting) could generate nested conditional code. Dealt with ^^^^^^^^^^ - Dimension joining - user interface for dim length prescription - Restrict-to-sequential and tagging have nothing to do with each other. -> Removed SequentialTag and turned it into a separate computed kernel property. - Just touching a variable written to by a non-idempotent instruction makes that instruction also not idempotent -> Idempotent renamed to boostable. -> Done. - Give the user control over which reduction inames are duplicated. - assert dependencies <= parent_inames in loopy/__init__.py -> Yes, this must be the case. -> If you include reduction inames. - Give a good error message if a parameter assignment in get_problems() is missing. - Slab decomposition for ILP -> I don't think that's possible. - It is hard to understand error messages that referred to instructions that are generated during preprocessing. -> Expose preprocessing to the user so she can inspect the preprocessed kernel. - Which variables need to be duplicated for ILP? -> Only reduction - implemented_domain may end up being smaller than requested in cse evaluations--check that! - Allow prioritization of loops in scheduling. - Make axpy better. - Screwy lower bounds in slab decomposition - reimplement add_prefetch - Flag, exploit idempotence - Some things involving CSEs might be impossible to schedule a[i,j] = cse(b[i]) * cse(c[j]) - Be smarter about automatic local axis choice -> What if we run out of axes? - Implement condition hoisting (needed, e.g., by slab decomposition) - Check for non-use of hardware axes - Slab decomposition for parallel dimensions - implement at the outermost nesting level regardless - bound *all* tagged inames - can't slab inames that share tags with other inames (for now) - Make syntax for iname dependencies - make syntax for insn dependencies - Implement get_problems() - CSE iname duplication might be unnecessary? (don't think so: It might be desired to do a full fetch before a mxm k loop even if that requires going iterative.) - Reduction needs to know a neutral element - Types of reduction variables? - Generalize reduction to be over multiple variables Should a dependency on an iname be forced in a CSE? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Local var: l | n g | y dl | Err d | Err Private var: l | y g | y dl | Err d | Err dg: Invalid-> error d: is duplicate l: is tagged as local idx g: is tagged as group idx Raise error if dl is targeting a private variable, regardless of whether it's a dependency or not.