Newer
Older
Reference Guide
===============
.. module:: loopy
.. moduleauthor:: Andreas Kloeckner <inform@tiker.net>
This guide defines all functionality exposed by loopy. If you would like
a more gentle introduction, you may consider reading the example-based
guide :ref:`guide` instead.
Inames
------
Loops are (by default) entered exactly once. This is necessary to preserve
depdency semantics--otherwise e.g. a fetch could happen inside one loop nest,
and then the instruction using that fetch could be inside a wholly different
loop nest.
Integer Domain
--------------
Expressions
-----------
* `if`
* complex-valued arithmetic
* tagging of array access and substitution rule use ("$")
Assignments and Substitution Rules
----------------------------------
label: [i,j|k,l] <float32> lhs[i,j,k] = EXPRESSION : dep_label, dep_label_2
The above example illustrates all the parts that are allowed in loo.py's
instruction syntax. All of these except for `lhs` and `EXPRESSION` are
optional.
* `label` is a unique identifier for this instruction, enabling you to
refer back to the instruction uniquely during further transformation
as well as specifying ordering dependencies.
* `dep_label,dep_label_2` are dependencies of the current instruction.
Loo.py will enforce that the instructions marked with these labels
are scheduled before this instruction.
* `<float32>` declares `lhs` as a temporary variable, with shape given
by the ranges of the `lhs` subscripts. (Note that in this case, the
`lhs` subscripts must be pure inames, not expressions, for now.)
Instead of a concrete type, an empty set of angle brackets `<>` may be
given to indicate that type inference should figure out the type of the
temporary.
* `[i,j|k,l]` specifies the inames within which this instruction is run.
Independent copies of the inames `k` and `l` will be made for this
instruction.
Syntax of an substitution rule::
rule_name(arg1, arg2) := EXPRESSION
.. _tags:
Tags
----
===================== ====================================================
Tag Meaning
===================== ====================================================
`None` | `"for"` Sequential loop
`"l.N"` Local (intra-group) axis N
`"l.auto"` Automatically chosen local (intra-group) axis
`"unr"` Plain unrolling
`"ilp"` | `"ilp.unr"` Unroll using instruction-level parallelism
`"ilp.seq"` Realize parallel iname as innermost loop
===================== ====================================================
(Throughout this table, `N` must be replaced by an actual number.)
"ILP" does three things:
* Restricts loops to be innermost
* Duplicates reduction storage for any reductions nested around ILP usage
* Causes a loop (unrolled or not) to be opened/generated for each
involved instruction
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
.. _automatic-axes:
Automatic Axis Assignment
^^^^^^^^^^^^^^^^^^^^^^^^^
Automatic local axes are chosen as follows:
#. For each instruction containing `"l.auto"` inames:
#. Find the lowest-numbered unused axis. If none exists,
use sequential unrolling instead.
#. Find the iname that has the smallest stride in any global
array access occurring in the instruction.
#. Assign the low-stride iname to the available axis, splitting
the iname if it is too long for the available axis size.
If you need different behavior, use :func:`tag_dimensions` and
:func:`split_dimension` to change the assignment of `"l.auto"` axes
manually.
.. _creating-kernels:
Creating Kernels
----------------
.. _arguments:
Arguments
^^^^^^^^^
.. autoclass:: ScalarArg
:members:
:undoc-members:
.. autoclass:: ArrayArg
:members:
:undoc-members:
.. autoclass:: ConstantArrayArg
:members:
:undoc-members:
.. autoclass:: ImageArg
:members:
:undoc-members:
.. _syntax:
String Syntax
^^^^^^^^^^^^^
* Substitution rules
* Instructions
Kernels
^^^^^^^
.. autoclass:: LoopKernel
Do not create :class:`LoopKernel` objects directly. Instead, use the following
function, which takes the same arguments, but does some extra post-processing.
.. autofunction:: make_kernel
Wrangling dimensions
--------------------
.. autofunction:: split_dimension
.. autofunction:: join_dimensions
.. autofunction:: tag_dimensions
Dealing with Substitution Rules
-------------------------------
.. autofunction:: extract_subst
.. autofunction:: expand_subst
Precomputation and Prefetching
------------------------------
.. autofunction:: precompute
.. autofunction:: add_prefetch
Uses :func:`extract_subst` and :func:`precompute`.
Manipulating Reductions
-----------------------
.. autofunction:: realize_reduction
Finishing up
------------
.. autofunction:: generate_loop_schedules
.. autofunction:: check_kernels
.. autofunction:: generate_code
Automatic Testing
-----------------
Troubleshooting
---------------
Printing :class:`LoopKernel` objects
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If you're confused about things loopy is referring to in an error message or
about the current state of the :class:`LoopKernel` you are transforming, the
following always works::
print kernel
(And it yields a human-readable--albeit terse--representation of *kernel*.)
.. autofunction:: preprocess_kernel
.. autofunction:: get_dot_dependency_graph