Newer
Older
.. _reference:
Reference Guide
===============
.. module:: loopy
.. moduleauthor:: Andreas Kloeckner <inform@tiker.net>
This guide defines all functionality exposed by loopy. If you would like
a more gentle introduction, you may consider reading the example-based
:ref:`tutorial` instead.
Loops are (by default) entered exactly once. This is necessary to preserve
dependency semantics--otherwise e.g. a fetch could happen inside one loop nest,
and then the instruction using that fetch could be inside a wholly different
loop nest.
^^^^^^^^^^^
Loopy's expressions are a slight superset of the expressions supported by
:mod:`pymbolic`.
* complex-valued arithmetic
* tagging of array access and substitution rule use ("$")
:mod:`loopy` uses the same type system as :mod:`numpy`. (See
:class:`numpy.dtype`) It also uses :mod:`pyopencl` for a registry of
user-defined types and their C equivalents. See :func:`pyopencl.tools.get_or_register_dtype`
For a string representation of types, all numpy types (e.g. ``float32`` etc.)
are accepted, in addition to what is registered in :mod:`pyopencl`.
Iname Implementation Tags
-------------------------
========================= ====================================================
Tag Meaning
========================= ====================================================
``None`` | ``"for"`` Sequential loop
``"l.N"`` Local (intra-group) axis N
``"g.N"`` Group-number axis N
``"unr"`` Unroll
``"ilp"`` | ``"ilp.unr"`` Unroll using instruction-level parallelism
``"ilp.seq"`` Realize parallel iname as innermost loop
========================= ====================================================
(Throughout this table, `N` must be replaced by an actual, zero-based number.)
"ILP" does three things:
* Restricts loops to be innermost
* Duplicates reduction storage for any reductions nested around ILP usage
* Causes a loop (unrolled or not) to be opened/generated for each
involved instruction
.. _creating-kernels:
Creating Kernels
----------------
.. _arguments:
Arguments
^^^^^^^^^
:members:
:undoc-members:
.. autoclass:: GlobalArg
:members:
:undoc-members:
.. autoclass:: ConstantArg
:members:
:undoc-members:
.. autoclass:: ImageArg
:members:
:undoc-members:
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
Loop domains
^^^^^^^^^^^^
TODO: Explain the domain tree
.. _isl-syntax:
ISL syntax
~~~~~~~~~~
The general syntax of an ISL set is the following::
{[VARIABLES]: CONDITIONS}
``VARIABLES`` is a simple list of identifiers representing loop indices,
or, as loopy calls them, inames. Example::
{[i, j, k]: CONDITIONS}
The following constructs are supported for ``CONDITIONS``:
* Simple conditions: ``i <= 15``, ``i>0``
* Conjunctions: ``i > 0 and i <= 15``
* Two-sided conditions: ``0 < i <= 15`` (equivalent to the previous
example)
* Identical conditions on multiple variables:
``0 < i,j <= 15``
* Equality constraints: ``i = j*3`` (**Note:** ``=``, not ``==``.)
* Modulo: ``i mod 3 = 0``
* Existential quantifiers: ``(exists l: i = 3*l)`` (equivalent to the
previous example)
Examples of constructs that are **not** allowed:
* Multiplication by non-constants: ``j*k``
* Disjunction: ``(i=1) or (i=5)``
(**Note:** This may be added in a future version of loopy.
For now, loop domains have to be convex.)
Temporary Variables
^^^^^^^^^^^^^^^^^^^
Temporary variables model OpenCL's ``private`` and ``local`` address spaces. Both
have the lifetime of a kernel invocation.
.. autoclass:: TemporaryVariable
:members:
:undoc-members:
.. autoclass:: UniqueName
.. autoclass:: InstructionBase
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
The general syntax of an instruction is a simple assignment::
LHS[i,j,k] = EXPRESSION
Several extensions of this syntax are defined, as discussed below. They
may be combined freely.
You can also use an instruction to declare a new temporary variable. (See
:ref:`temporaries`.) See :ref:`types` for what types are acceptable. If the
``LHS`` has a subscript, bounds on the indices are inferred (which must be
constants at the time of kernel creation) and the declared temporary is
created as an array. Instructions declaring temporaries have the following
form::
<temp_var_type> LHS[i,j,k] = EXPRESSION
You can also create a temporary and ask loopy to determine its type
automatically. This uses the following syntax::
<> LHS[i,j,k] = EXPRESSION
Lastly, each instruction may optionally have a number of attributes
specified, using the following format::
LHS[i,j,k] = EXPRESSION {attr1,attr2=value1:value2}
These are usually key-value pairs. The following attributes are recognized:
* ``id=value`` sets the instruction's identifier to ``value``. ``value``
must be unique within the kernel. This identifier is used to refer to the
instruction after it has been created, such as from ``dep`` attributes
(see below) or from :mod:`context matches <loopy.context_matching>`.
* ``id_prefix=value`` also sets the instruction's identifier, however
uniqueness is ensured by loopy itself, by appending further components
(often numbers) to the given ``id_prefix``.
* ``inames=i:j:k`` forces the instruction to reside within the loops over
Andreas Klöckner
committed
:ref:`inames` ``i``, ``j`` and ``k`` (and only those).
.. note::
The default for the inames that the instruction depends on is
the inames used in the instruction itself plus the common
subset of inames shared by writers of all variables read by the
instruction.
You can add a plus sign ("``+``") to the front of this option
value to indicate that you would like the inames you specify here
to be in addition to the ones found by the heuristic described above.
* ``dup=i:j->j_new:k->k_new`` makes a copy of the inames ``i``, ``j``, and
``k``, with all the same domain constraints as the original inames.
A new name of the copy of ``i`` will be automatically chosen, whereas
the new name of ``j`` will be ``j_new``, and the new name of ``k`` will
be ``k_new``.
This is a shortcut for calling :func:`loopy.duplicate_inames` later
(once the kernel is created).
* ``dep=id1:id2`` creates a dependency of this instruction on the
instructions with identifiers ``id1`` and ``id2``. The meaning of this
dependency is that the code generated for this instruction is required to
appear textually after all of these dependees' generated code.
Identifiers here are allowed to be wildcards as defined by
the Python module :mod:`fnmatchcase`. This is helpful in conjunction
with ``id_prefix``.
Since specifying all possible dependencies is cumbersome and
error-prone, :mod:`loopy` employs a heuristic to automatically find
dependencies. Specifically, :mod:`loopy` will automatically add
a dependency to an instruction reading a variable if there is
exactly one instruction writing that variable. ("Variable" here may
mean either temporary variable or kernel argument.)
If each variable in a kernel is only written once, then this
heuristic should be able to compute all required dependencies.
Conversely, if a variable is written by two different instructions,
all ordering around that variable needs to be specified explicitly.
It is recommended to use :func:`get_dot_dependency_graph` to
visualize the dependency graph of possible orderings.
You may use a leading asterisk ("``*``") to turn off the single-writer
heuristic and indicate that the specified list of dependencies is
exhaustive.
* ``priority=integer`` sets the instructions priority to the value
``integer``. Instructions with higher priority will be scheduled sooner,
if possible. Note that the scheduler may still schedule a lower-priority
instruction ahead of a higher-priority one if loop orders or dependencies
require it.
* ``if=variable1:variable2`` Only execute this instruction if all condition
variables (which must be scalar variables) evaluate to ``true`` (as
defined by C).
* ``tags=tag1:tag2`` Apply tags to this instruction that can then be used
for :ref:`context-matching`.
* ``groups=group1:group2`` Make this instruction part of the given
instruction groups. See :class:`InstructionBase.groups`.
* ``conflicts_grp=group1:group2`` Make this instruction conflict with the
given instruction groups. See
:class:`InstructionBase.conflicts_with_groups`.
Assignment instructions are expressed as instances of the following class:
.. autoclass:: ExpressionInstruction
.. _expression-syntax:
Expression Syntax
~~~~~~~~~~~~~~~~~
TODO: Functions
TODO: Reductions
.. autoclass:: CInstruction
Kernels
^^^^^^^
Do not create :class:`LoopKernel` objects directly. Instead, use the following
.. autofunction:: make_kernel
Andreas Klöckner
committed
.. autofunction:: parse_fortran
.. autofunction:: parse_transformed_fortran
.. autofunction:: fuse_kernels
Andreas Klöckner
committed
.. autofunction:: c_preprocess
.. autofunction:: parse_stack_match
.. currentmodule:: loopy
.. autofunction:: duplicate_inames
.. undocumented .. autofunction:: link_inames
.. autofunction:: set_loop_priority
.. autofunction:: split_reduction_inward
.. autofunction:: split_reduction_outward
Dealing with Parameters
^^^^^^^^^^^^^^^^^^^^^^^
Dealing with Substitution Rules
.. autofunction:: extract_subst
.. autofunction:: assignment_to_subst
.. autofunction:: expand_subst
.. autofunction:: find_rules_matching
.. autofunction:: find_one_rule_matching
.. autofunction:: precompute
.. autofunction:: add_prefetch
.. autofunction:: alias_temporaries
Influencing data access
^^^^^^^^^^^^^^^^^^^^^^^
.. autofunction:: change_arg_to_image
.. autofunction:: tag_data_axes
.. autofunction:: remove_unused_arguments
Padding
.. autofunction:: split_arg_axis
.. autofunction:: find_padding_multiple
.. autofunction:: add_padding
Manipulating Instructions
.. autofunction:: set_instruction_priority
.. autofunction:: add_dependency
.. autofunction:: remove_instructions
.. autofunction:: tag_instructions
Library interface
^^^^^^^^^^^^^^^^^
.. autofunction:: register_reduction_parser
.. autofunction:: register_preamble_generators
.. autofunction:: register_symbol_manglers
.. autofunction:: register_function_manglers
Arguments
^^^^^^^^^
.. autofunction:: set_argument_order
.. autofunction:: add_dtypes
.. autofunction:: infer_unknown_types
.. autofunction:: add_and_infer_dtypes
Batching
^^^^^^^^
.. autofunction:: to_batched
.. autofunction:: generate_loop_schedules
.. autofunction:: get_one_scheduled_kernel
.. autofunction:: generate_code
Running
-------
.. autoclass:: CompiledKernel
Automatic Testing
-----------------
Troubleshooting
---------------
Printing :class:`LoopKernel` objects
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If you're confused about things loopy is referring to in an error message or
about the current state of the :class:`LoopKernel` you are transforming, the
following always works::
print kernel
(And it yields a human-readable--albeit terse--representation of *kernel*.)
.. autofunction:: preprocess_kernel
.. autofunction:: get_dot_dependency_graph
.. autofunction:: show_dependency_graph
Andreas Klöckner
committed
Options
-------
.. autoclass:: Options
Andreas Klöckner
committed
.. autofunction:: set_options
Andreas Klöckner
committed
Controlling caching
-------------------
.. autofunction:: set_caching_enabled
.. autoclass:: CacheMode
Obtaining Kernel Statistics
---------------------------
.. autofunction:: get_op_poly
.. autofunction:: get_gmem_access_poly
.. autofunction:: sum_mem_access_to_bytes
.. autofunction:: get_barrier_poly
.. vim: tw=75:spell