Document g_times_l.

f50dcc16 · Andreas Klöckner · de28d5c5 · f50dcc16 · f50dcc16
Commit f50dcc16 authored 13 years ago by Andreas Klöckner
--- a/doc/source/misc.rst
+++ b/doc/source/misc.rst
@@ -73,6 +73,11 @@ Version 2011.2
    This version is currently under development. You can get snapshots from
    PyOpenCL's git version control.

+Version 2011.1.2
+----------------
+
+* More bug fixes.
+
 Version 2011.1.1
 ----------------

@@ -108,6 +113,7 @@ Version 2011.1
 * Add :func:`pyopencl.enqueue_copy`. Deprecate all other transfer functions.
 * Add support for numerous extensions, among them device fission.
 * Add a compiler cache.
+* Add the 'g_times_l' keyword arg to kernel execution.

 Version 0.92
 ------------

--- a/doc/source/runtime.rst
+++ b/doc/source/runtime.rst
@@ -782,7 +782,7 @@ Programs and Kernels
               prg.kernel(queue, n_globals, None, args)


-    .. method:: __call__(queue, global_size, local_size, *args, global_offset=None, wait_for=None)
+    .. method:: __call__(queue, global_size, local_size, *args, global_offset=None, wait_for=None, g_times_l=False)

        Use :func:`enqueue_nd_range_kernel` to enqueue a kernel execution, after using
        :meth:`set_args` to set each argument in turn. See the documentation for
@@ -791,6 +791,11 @@ Programs and Kernels

        *None* may be passed for local_size.

+        If *g_times_l* is specified, the global size will be multiplied by the
+        local size. (which makes the behavior more like Nvidia CUDA) In this case,
+        *global_size* and *local_size* also do not have to have the same number
+        of dimensions.
+
        .. versionchanged:: 0.92
            *local_size* was promoted to third positional argument from being a
            keyword argument. The old keyword argument usage will continue to
@@ -802,6 +807,9 @@ Programs and Kernels
            its treatment in the wrapper had a bug (now fixed) that prevented
            it from working.

+        .. versionchanged:: 2011.1
+            Added the *g_times_l* keyword arg.
+
    |comparable|

 .. class:: LocalMemory(size)
@@ -814,9 +822,18 @@ Programs and Kernels

        The size of local buffer in bytes to be provided.

-.. function:: enqueue_nd_range_kernel(queue, kernel, global_work_size, local_work_size, global_work_offset=None, wait_for=None)
+.. function:: enqueue_nd_range_kernel(queue, kernel, global_work_size, local_work_size, global_work_offset=None, wait_for=None, g_times_l=True)

    |std-enqueue-blurb|
+    
+    If *g_times_l* is specified, the global size will be multiplied by the
+    local size. (which makes the behavior more like Nvidia CUDA) In this case,
+    *global_size* and *local_size* also do not have to have the same number
+    of dimensions.
+
+    .. versionchanged:: 2011.1
+        Added the *g_times_l* keyword arg.
+

 .. function:: enqueue_task(queue, kernel, wait_for=None)