Reference Guide =============== .. module:: loopy .. moduleauthor:: Andreas Kloeckner <inform@tiker.net> This guide defines all functionality exposed by loopy. If you would like a more gentle introduction, you may consider reading the example-based guide :ref:`guide` instead. Inames ------ Loops are (by default) entered exactly once. This is necessary to preserve depdency semantics--otherwise e.g. a fetch could happen inside one loop nest, and then the instruction using that fetch could be inside a wholly different loop nest. Integer Domain -------------- Expressions ----------- * `if` * `reductions` * duplication of reduction inames * complex-valued arithmetic * tagging of array access and substitution rule use ("$") Assignments and Substitution Rules ---------------------------------- Syntax of an instruction:: label: [i,j|k,l] <float32> lhs[i,j,k] = EXPRESSION : dep_label, dep_label_2 The above example illustrates all the parts that are allowed in loo.py's instruction syntax. All of these except for `lhs` and `EXPRESSION` are optional. * `label` is a unique identifier for this instruction, enabling you to refer back to the instruction uniquely during further transformation as well as specifying ordering dependencies. * `dep_label,dep_label_2` are dependencies of the current instruction. Loo.py will enforce that the instructions marked with these labels are scheduled before this instruction. * `<float32>` declares `lhs` as a temporary variable, with shape given by the ranges of the `lhs` subscripts. (Note that in this case, the `lhs` subscripts must be pure inames, not expressions, for now.) Instead of a concrete type, an empty set of angle brackets `<>` may be given to indicate that type inference should figure out the type of the temporary. * `[i,j|k,l]` specifies the inames within which this instruction is run. Independent copies of the inames `k` and `l` will be made for this instruction. Syntax of an substitution rule:: rule_name(arg1, arg2) := EXPRESSION .. _tags: Tags ---- ===================== ==================================================== Tag Meaning ===================== ==================================================== `None` | `"for"` Sequential loop `"l.N"` Local (intra-group) axis N `"l.auto"` Automatically chosen local (intra-group) axis `"g.N"` Group-number axis N `"unr"` Plain unrolling `"ilp"` | `"ilp.unr"` Unroll using instruction-level parallelism `"ilp.seq"` Realize parallel iname as innermost loop ===================== ==================================================== (Throughout this table, `N` must be replaced by an actual number.) "ILP" does three things: * Restricts loops to be innermost * Duplicates reduction storage for any reductions nested around ILP usage * Causes a loop (unrolled or not) to be opened/generated for each involved instruction .. _automatic-axes: Automatic Axis Assignment ^^^^^^^^^^^^^^^^^^^^^^^^^ Automatic local axes are chosen as follows: #. For each instruction containing `"l.auto"` inames: #. Find the lowest-numbered unused axis. If none exists, use sequential unrolling instead. #. Find the iname that has the smallest stride in any global array access occurring in the instruction. #. Assign the low-stride iname to the available axis, splitting the iname if it is too long for the available axis size. If you need different behavior, use :func:`tag_dimensions` and :func:`split_dimension` to change the assignment of `"l.auto"` axes manually. .. _creating-kernels: Creating Kernels ---------------- .. _arguments: Arguments ^^^^^^^^^ .. autoclass:: ScalarArg :members: :undoc-members: .. autoclass:: ArrayArg :members: :undoc-members: .. autoclass:: ConstantArrayArg :members: :undoc-members: .. autoclass:: ImageArg :members: :undoc-members: .. _syntax: String Syntax ^^^^^^^^^^^^^ * Substitution rules * Instructions Kernels ^^^^^^^ .. autoclass:: LoopKernel Do not create :class:`LoopKernel` objects directly. Instead, use the following function, which takes the same arguments, but does some extra post-processing. .. autofunction:: make_kernel Wrangling dimensions -------------------- .. autofunction:: split_dimension .. autofunction:: join_dimensions .. autofunction:: tag_dimensions Dealing with Substitution Rules ------------------------------- .. autofunction:: extract_subst .. autofunction:: expand_subst Precomputation and Prefetching ------------------------------ .. autofunction:: precompute .. autofunction:: add_prefetch Uses :func:`extract_subst` and :func:`precompute`. Manipulating Reductions ----------------------- .. autofunction:: realize_reduction Finishing up ------------ .. autofunction:: generate_loop_schedules .. autofunction:: check_kernels .. autofunction:: generate_code Automatic Testing ----------------- .. autofunction:: auto_test_vs_ref Troubleshooting --------------- Printing :class:`LoopKernel` objects ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If you're confused about things loopy is referring to in an error message or about the current state of the :class:`LoopKernel` you are transforming, the following always works:: print kernel (And it yields a human-readable--albeit terse--representation of *kernel*.) .. autofunction:: preprocess_kernel .. autofunction:: get_dot_dependency_graph