diff --git a/doc/misc.rst b/doc/misc.rst
index cd52ae1b0eaaba1fd576491392841a9c48d4ebdf..c339280f58139fd5cfaa1589e4a47b58ea7713bc 100644
--- a/doc/misc.rst
+++ b/doc/misc.rst
@@ -83,8 +83,204 @@ OTHER DEALINGS IN THE SOFTWARE.
 Frequently Asked Questions
 ==========================
 
-The FAQ is maintained collaboratively on the
-`Wiki FAQ page <http://wiki.tiker.net/Loopy/FrequentlyAskedQuestions>`_.
+Is Loopy specific to OpenCL?
+----------------------------
+
+No, absolutely not. You can switch to a different code generation target
+(subclasses of :class:`loopy.TargetBase`) by using (say)::
+
+    knl = knl.copy(target=loopy.CudaTarget())
+
+Also see :ref:`targets`. (Py)OpenCL right now has the best support for
+running kernels directly out of the box, but that could easily be expanded.
+Open an issue to discuss what you need.
+
+In the meantime, you can generate code simply by saying::
+
+    cg_result = loopy.generate_code_v2(knl)
+    print(cg_result.host_code())
+    print(cg_result.device_code())
+
+For what types of codes does :mod:`loopy` work well?
+----------------------------------------------------
+
+Any array-based/number-crunching code whose control flow is not *too*
+data dependent should be expressible. For example:
+
+* Sparse matrix-vector multiplies, despite data-dependent control
+  flow (varying row lengths, say), is easy and natural to express.
+
+* Looping until convergence on the other hand is an example
+  of something that can't be expressed easily. Such checks
+  would have to be performed outside of :mod:`loopy` code.
+
+Can I see some examples?
+------------------------
+
+Loopy has a ton of tests, and right now, those are probably the best
+source of examples. Here are some links:
+
+* `Tests directory <https://github.com/inducer/loopy/tree/master/test>`_
+* `Applications tests <https://github.com/inducer/loopy/blob/master/test/test_apps.py>`_
+* `Feature tests <https://github.com/inducer/loopy/blob/master/test/test_loopy.py>`_
+
+Here's a more complicated example of a loopy code:
+
+.. literalinclude:: ../examples/python/find-centers.py
+    :language: c
+
+This example is included in the :mod:`loopy` distribution as
+:download:`examples/python/find-centers.py <../examples/python/find-centers.py>`.
+What this does is find nearby "centers" satisfying some criteria
+for an array of points ("targets").
+
+What types of transformations can I do?
+---------------------------------------
+
+This list is always growing, but here are a few pointers:
+
+* Unroll
+
+  Use :func:`loopy.tag_inames` with the ``"unr"`` tag.
+  Unrolled loops must have a fixed size. (See either
+  :func:`loopy.split_iname` or :func:`loopy.fix_parameters`.)
+
+* Stride changes (Row/column/something major)
+
+  Use :func:`loopy.tag_array_axes` with (e.g.) ``stride:17`` or
+  ``N1,N2,N0`` to determine how each axis of an array is realized.
+
+* Prefetch
+
+  Use :func:`loopy.add_prefetch`.
+
+* Reorder loops
+
+  Use :func:`loopy.set_loop_priority`.
+
+* Precompute subexpressions:
+
+  Use a :ref:`substitution-rule` to assign a name to a subexpression,
+  using may be :func:`loopy.assignment_to_subst` or :func:`extract_subst`.
+  Then use :func:`loopy.precompute` to create an (array or scalar)
+  temporary with precomputed values.
+
+* Tile:
+
+  Use :func:`loopy.split_iname` to produce enough loops, then use
+  :func:`loopy.set_loop_priority` to set the ordering.
+
+* Fix constants
+
+  Use :func:`loopy.fix_parameters`.
+
+* Parallelize (across cores)
+
+  Use :func:`loopy.tag_inames` with the ``"g.0"``, ``"g.1"`` (and so on) tags.
+
+* Parallelize (across vector lanes)
+
+  Use :func:`loopy.tag_inames` with the ``"l.0"``, ``"l.1"`` (and so on) tags.
+
+* Affinely map loop domains
+
+  Use :func:`loopy.affine_map_inames`.
+
+* Texture-based data access
+
+  Use :func:`loopy.change_arg_to_image` to use texture memory
+  for an argument.
+
+* Kernel Fusion
+
+  Use :func:`loopy.fuse_kernels`.
+
+* Explicit-SIMD Vectorization
+
+  Use :func:`loopy.tag_inames` with the ``"vec"`` iname tag.
+  Note that the corresponding axis of an array must
+  also be tagged using the ``"vec"`` array axis tag
+  (using :func:`tag_array_axes`) in order for vector code to be
+  generated.
+
+  Vectorized loops (and array axes) must have a fixed size. (See either
+  :func:`split_iname` or :func:`fix_parameters` along with
+  :func:`split_array_axis`.)
+
+* Reuse of Temporary Storage
+
+  Use :func:`loopy.alias_temporaries` to reduce the size of intermediate
+  storage.
+
+* SoA $\leftrightarrow$ AoS
+
+  Use :func:`tag_array_axes` with the ``"sep"`` array axis tag
+  to generate separate arrays for each entry of a short, fixed-length
+  array axis.
+
+  Separated array axes must have a fixed size. (See either
+  :func:`loopy.split_array_axis`.)
+
+* Realization of Instruction-level parallelism
+
+  Use :func:`loopy.tag_inames` with the ``"ilp"`` tag.
+  ILP loops must have a fixed size. (See either
+  :func:`split_iname` or :func:`fix_parameters`.)
+
+* Type inference
+
+  Use :func:`loopy.add_and_infer_dtypes`.
+
+* Convey assumptions:
+
+  Use :func:`loopy.assume` to say, e.g.
+  ``loopy.assume(knl, "N mod 4 = 0")`` or
+  ``loopy.assume(knl, "N > 0")``.
+
+* Perform batch computations
+
+  Use :func:`loopy.to_batched`.
+
+* Interface with your own library functions
+
+  Use :func:`loopy.register_function_manglers`.
+
+Uh-oh. I got a scheduling error. Any hints?
+-------------------------------------------
+
+* Make sure that dependencies between instructions are as
+  you intend.
+
+  Use :func:`loopy.show_dependency_graph` to check.
+
+  There's a heuristic that tries to help find dependencies. If there's
+  only a single write to a variable, then it adds dependencies from all
+  readers to the writer. In your case, that's actually counterproductive,
+  because it creates a circular dependency, hence the scheduling issue.
+  So you'll have to turn that off, like so::
+
+      knl = lp.make_kernel(
+          "{ [t]: 0 <= t < T}",
+          """
+          <> xt = x[t] {id=fetch,dep=*}
+          x[t + 1] = xt * 0.1 {dep=fetch}
+          """)
+
+* Make sure that your loops are correctly nested.
+
+  Print the kernel to make sure all instructions are within
+  the set of inames you intend them to be in.
+
+* One iname is one for loop.
+
+  For sequential loops, one iname corresponds to exactly one
+  ``for`` loop in generated code. Loopy will not generate multiple
+  loops from one iname.
+
+* Make sure that your loops are correctly nested.
+
+  The scheduler will try to be as helpful as it can in telling
+  you where it got stuck.
 
 Citing Loopy
 ============
diff --git a/doc/ref_kernel.rst b/doc/ref_kernel.rst
index 560facd63f183e1113ccb7cb94ff1953aced6e7d..e41fbd6e89abbe7fc120b1460f982045d807dca9 100644
--- a/doc/ref_kernel.rst
+++ b/doc/ref_kernel.rst
@@ -468,6 +468,8 @@ Kernel Options
 
 .. autoclass:: Options
 
+.. _targets:
+
 Targets
 -------
 
diff --git a/examples/python/find-centers.py b/examples/python/find-centers.py
new file mode 100644
index 0000000000000000000000000000000000000000..c5e5e916156fd44b5a37cdb3cd41718916461a06
--- /dev/null
+++ b/examples/python/find-centers.py
@@ -0,0 +1,43 @@
+import numpy as np
+import loopy as lp
+import pyopencl as cl
+
+cl_ctx = cl.create_some_context(interactive=True)
+
+knl = lp.make_kernel(
+    "{[ictr,itgt,idim]: "
+    "0<=itgt<ntargets "
+    "and 0<=ictr<ncenters "
+    "and 0<=idim<ambient_dim}",
+
+    """
+    for itgt
+        for ictr
+            <> dist_sq = sum(idim,
+                    (tgt[idim,itgt] - center[idim,ictr])**2)
+            <> in_disk = dist_sq < (radius[ictr]*1.05)**2
+            <> matches = (
+                    (in_disk
+                        and qbx_forced_limit == 0)
+                    or (in_disk
+                            and qbx_forced_limit != 0
+                            and qbx_forced_limit * center_side[ictr] > 0)
+                    )
+
+            <> post_dist_sq = if(matches, dist_sq, HUGE)
+        end
+        <> min_dist_sq, <> min_ictr = argmin(ictr, post_dist_sq)
+
+        tgt_to_qbx_center[itgt] = if(min_dist_sq < HUGE, min_ictr, -1)
+    end
+    """)
+
+knl = lp.fix_parameters(knl, ambient_dim=2)
+knl = lp.add_and_infer_dtypes(knl, {
+        "tgt,center,radius,HUGE": np.float32,
+        "center_side,qbx_forced_limit": np.int32,
+        })
+
+lp.auto_test_vs_ref(knl, cl_ctx, knl, parameters={
+        "HUGE": 1e20, "ncenters": 200, "ntargets": 300,
+        "qbx_forced_limit": 1})
diff --git a/loopy/target/__init__.py b/loopy/target/__init__.py
index eb39539b9c489320b227da7c7397c0748a704159..88e656a1e3a4bfeb25a250dbcb3a05d1f805bac8 100644
--- a/loopy/target/__init__.py
+++ b/loopy/target/__init__.py
@@ -36,6 +36,8 @@ __doc__ = """
 .. autoclass:: OpenCLTarget
 .. autoclass:: PyOpenCLTarget
 .. autoclass:: ISPCTarget
+.. autoclass:: NumbaTarget
+.. autoclass:: NumbaCudaTarget
 
 """
 
diff --git a/loopy/target/numba.py b/loopy/target/numba.py
index 95c1de08c9ef90bda6438d613e45e0515508573d..6946063ee04f52a4890344b4cbff9446bacb6923 100644
--- a/loopy/target/numba.py
+++ b/loopy/target/numba.py
@@ -167,7 +167,7 @@ class NumbaCudaASTBuilder(NumbaBaseASTBuilder):
 
 
 class NumbaCudaTarget(TargetBase):
-    """A target for plain Python, without any parallel extensions.
+    """A target for Numba with CUDA extensions.
     """
 
     host_program_name_suffix = ""