Skip to content
Snippets Groups Projects
MEMO 5.28 KiB
Newer Older
  • Learn to ignore specific revisions
  • Tim Warburton's avatar
    Tim Warburton committed
    Documentation Notes
    ^^^^^^^^^^^^^^^^^^^
    
    - Need to clarify fundamental difference between constants baked into code
      and things that remain variable. (ISL parameters, symbolic shapes)
    
    
    Things to consider
    ^^^^^^^^^^^^^^^^^^
    
    - Depedencies are pointwise for shared loop dimensions
      and global over non-shared ones (between dependent and ancestor)
    
    
    - multiple insns could fight over which iname gets local axis 0
      -> complicated optimization problem
    
    
    - Every loop in loopy is opened at most once.
      Too restrictive?
    
    - Loop bounds currently may not depend on parallel dimensions
      Does it make sense to relax this?
    
    
    - Why do CSEs necessarily have to duplicate the inames?
      -> because that would be necessary for a sequential prefetch
    
    - Cannot do slab decomposition on inames that share a tag with
      other inames
      -> Is that reasonable?
    
    
    - Not using all hw loop dimensions causes an error, as
      is the case for variant 3 in the rank_one test.
    
    Tim Warburton's avatar
    Tim Warburton committed
    - Measure efficiency of corner cases
    
    
    - Loopy as a data model for implementing custom rewritings
    
    
    - We won't generate WAW barrier-needing dependencies
    
    Tim Warburton's avatar
    Tim Warburton committed
    To-do
    ^^^^^
    
    
    - CSE should be more like variable assignment
    
    
    Tim Warburton's avatar
    Tim Warburton committed
    Future ideas
    ^^^^^^^^^^^^
    
    - Barriers for data exchanged via global vars?
    
    
    - Float4 joining on fetch/store?
    
    - How can one automatically generate something like microblocks?
    
      -> Some sort of axis-adding transform?
    
    - Better for loop bound generation
      -> Try a triangular loop
    
    
    Tim Warburton's avatar
    Tim Warburton committed
    - Sharing of checks across ILP instances
    
    Tim Warburton's avatar
    Tim Warburton committed
    
    - Eliminate the first (pre-)barrier in a loop.
    
    - Generate automatic test against sequential code.
    
    - Automatically verify that all array access is within bounds.
    
    - Reason about generated code, give user feedback on potential
      improvements.
    
    - Convolutions, Stencils
    
    - DMA engine threads?
    
    - Divisibility, modulo, strides?
    
    - Try, fix indirect addressing
    
    - Use gists (why do disjoint sets arise?)
    
    
    Tim Warburton's avatar
    Tim Warburton committed
    - Nested slab decomposition (in conjunction with conditional hoisting) could
      generate nested conditional code.
    
    
    Dealt with
    ^^^^^^^^^^
    
    
    - Deal with equality constraints.
      (These arise, e.g., when partitioning a loop of length 16 into 16s.)
    
    - dim_{min,max} caching
    
    - Exhaust the search for a no-boost solution first, before looking
      for a schedule with boosts.
    
    
    - Pick not just axis 0, but all axes by lowest available stride
    
    
    - Scheduler tries too many boostability-related options
    
    
    - Automatically generate testing code vs. sequential.
    
    
    - If isl can prove that all operands are positive, may use '/' instead of
      'floor_div'.
    
    
    - For forced workgroup sizes: check that at least one iname
      maps to them.
    
    
    - variable shuffle detection
      -> will need unification
    
    
    - Restrict-to-sequential and tagging have nothing to do with each other.
      -> Removed SequentialTag and turned it into a separate computed kernel
      property.
    
    
    - Just touching a variable written to by a non-idempotent
      instruction makes that instruction also not idempotent
      -> Idempotent renamed to boostable.
      -> Done.
    
    
    - Give the user control over which reduction inames are
      duplicated.
    
    
    - assert dependencies <= parent_inames in loopy/__init__.py
      -> Yes, this must be the case.
    
    - Give a good error message if a parameter assignment in get_problems()
      is missing.
    
    
    Tim Warburton's avatar
    Tim Warburton committed
    - Slab decomposition for ILP
      -> I don't think that's possible.
    
    
    Tim Warburton's avatar
    Tim Warburton committed
    - It is hard to understand error messages that referred to instructions that
      are generated during preprocessing.
    
    Tim Warburton's avatar
    Tim Warburton committed
    
      -> Expose preprocessing to the user so she can inspect the preprocessed
         kernel.
    
    Tim Warburton's avatar
    Tim Warburton committed
    
    
    Andreas Klöckner's avatar
    Andreas Klöckner committed
    - Which variables need to be duplicated for ILP?
      -> Only reduction
    
    
    - implemented_domain may end up being smaller than requested in cse
      evaluations--check that!
    
    
    - Allow prioritization of loops in scheduling.
    
    - Make axpy better.
    
    
    - Screwy lower bounds in slab decomposition
    
    
    - reimplement add_prefetch
    
    
    - Flag, exploit idempotence
    
    - Some things involving CSEs might be impossible to schedule
      a[i,j] = cse(b[i]) * cse(c[j])
    
    - Be smarter about automatic local axis choice
      -> What if we run out of axes?
    
    
    - Implement condition hoisting
      (needed, e.g., by slab decomposition)
    
    
    - Check for non-use of hardware axes
    
    
    - Slab decomposition for parallel dimensions
      - implement at the outermost nesting level regardless
      - bound *all* tagged inames
      - can't slab inames that share tags with other inames (for now)
    
    
    - Make syntax for iname dependencies
    
    - make syntax for insn dependencies
    
    Andreas Klöckner's avatar
    Andreas Klöckner committed
    - Implement get_problems()
    
    
    - CSE iname duplication might be unnecessary?
      (don't think so: It might be desired to do a full fetch before a mxm k loop
      even if that requires going iterative.)
    
    
    - Reduction needs to know a neutral element
    
    - Types of reduction variables?
    
    
    - Generalize reduction to be over multiple variables
    
    
    - duplicate_dimensions can be implemented without having to muck around 
      with individual constraints:
      - add_dims
      - move_dims
      - intersect
    
    Tim Warburton's avatar
    Tim Warburton committed
    
    
    Should a dependency on an iname be forced in a CSE?
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    
    Local var:
    
    l  | n
    g  | y
    dl | Err
    d  | Err
    
    Private var:
    
    l  | y
    g  | y
    dl | Err
    d  | Err
    
    dg: Invalid-> error
    
    d: is duplicate
    l: is tagged as local idx
    g: is tagged as group idx
    
    Raise error if dl is targeting a private variable, regardless of whether it's
    a dependency or not.