From 0e92962d012441cbc8356cd67ea682f7fef2e8c8 Mon Sep 17 00:00:00 2001 From: Matt Wala Date: Fri, 15 Sep 2017 15:48:02 -0500 Subject: [PATCH 1/4] Update tutorial to mention local barrier instructions. --- doc/tutorial.rst | 30 +++++++++++++++++++++--------- 1 file changed, 21 insertions(+), 9 deletions(-) diff --git a/doc/tutorial.rst b/doc/tutorial.rst index 92ec799f7..ee6b8b098 100644 --- a/doc/tutorial.rst +++ b/doc/tutorial.rst @@ -1118,14 +1118,26 @@ Once a work item has reached a barrier, it waits for everyone that it synchronizes with to reach the barrier before continuing. This means that unless all work items reach the same barrier, the kernel will hang during execution. +Barrier insertion +~~~~~~~~~~~~~~~~~ + By default, :mod:`loopy` inserts local barriers between two instructions when it detects that a dependency involving local memory may occur across work items. To -see this in action, take a look at the section on :ref:`local_temporaries`. +see this in action, take a look at the section on :ref:`local_temporaries`. In +contrast, :mod:`loopy` will *not* insert global barriers automatically. + +Barriers may also be inserted manually into the kernel. The syntax for a global +barrier instruction is ``... gbarrier``. The syntax for a local barrier +instruction is ``... lbarrier``. + +Saving temporaries across global barriers +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -In contrast, :mod:`loopy` will *not* insert global barriers automatically. -Global barriers require manual intervention along with some special -post-processing which we describe below. Consider the following kernel, which -attempts to rotate its input to the right by 1 in parallel: +When working with global barriers it may be necessary to save and reload +temporaries that are live across global barriers. This section presents an +example of how to use :func:`loopy.save_and_reload_temporaries` which is helpful +for that purpose. Consider the following kernel, which attempts to rotate its +input to the right by 1 in parallel: .. doctest:: @@ -1153,8 +1165,8 @@ this, :mod:`loopy` will complain that global barrier needs to be inserted: ... MissingBarrierError: Dependency 'rotate depends on maketmp' (for variable 'arr') requires synchronization by a global barrier (add a 'no_sync_with' instruction option to state that no synchronization is needed) -The syntax for a global barrier instruction is ``... gbarrier``. This needs to -be added between the pair of offending instructions. +To address this we add the ``... gbarrier`` instruction between the pair of +offending instructions. .. doctest:: @@ -1175,7 +1187,7 @@ be added between the pair of offending instructions. ... assumptions="n mod 16 = 0") >>> knl = lp.split_iname(knl, "i", 16, inner_tag="l.0", outer_tag="g.0") -When we try to generate code for this, it will still not work. +Here is what happens when we try to generate code for the kernel: >>> cgr = lp.generate_code_v2(knl) Traceback (most recent call last): @@ -1280,7 +1292,7 @@ The kernel translates into two OpenCL kernels. arr[((1 + lid(0) + gid(0) * 16) % n)] = tmp; } -Executing the kernel does what we expect. +Now we can execute the kernel. >>> arr = cl.array.arange(queue, 16, dtype=np.int32) >>> print(arr) -- GitLab From a2e6506fae9d81fb451e13f8616e881bc334730e Mon Sep 17 00:00:00 2001 From: Matt Wala Date: Fri, 15 Sep 2017 15:50:23 -0500 Subject: [PATCH 2/4] may be -> is --- doc/tutorial.rst | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/doc/tutorial.rst b/doc/tutorial.rst index ee6b8b098..94bcd0c31 100644 --- a/doc/tutorial.rst +++ b/doc/tutorial.rst @@ -1133,11 +1133,11 @@ instruction is ``... lbarrier``. Saving temporaries across global barriers ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -When working with global barriers it may be necessary to save and reload -temporaries that are live across global barriers. This section presents an -example of how to use :func:`loopy.save_and_reload_temporaries` which is helpful -for that purpose. Consider the following kernel, which attempts to rotate its -input to the right by 1 in parallel: +When working with global barriers it is necessary to save and reload temporaries +that are live across global barriers. This section presents an example of how to +use :func:`loopy.save_and_reload_temporaries` which is helpful for that +purpose. Consider the following kernel, which attempts to rotate its input to +the right by 1 in parallel: .. doctest:: -- GitLab From 51db49b1c5e31ca0f9681ca88243f2f182e7389c Mon Sep 17 00:00:00 2001 From: Matt Wala Date: Fri, 15 Sep 2017 15:52:40 -0500 Subject: [PATCH 3/4] mention non-global temporaries need to be saved --- doc/tutorial.rst | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/doc/tutorial.rst b/doc/tutorial.rst index 94bcd0c31..13d23dcdd 100644 --- a/doc/tutorial.rst +++ b/doc/tutorial.rst @@ -1133,11 +1133,11 @@ instruction is ``... lbarrier``. Saving temporaries across global barriers ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -When working with global barriers it is necessary to save and reload temporaries -that are live across global barriers. This section presents an example of how to -use :func:`loopy.save_and_reload_temporaries` which is helpful for that -purpose. Consider the following kernel, which attempts to rotate its input to -the right by 1 in parallel: +When working with global barriers it is necessary to save and reload non-global +temporaries that are live across global barriers. This section presents an +example of how to use :func:`loopy.save_and_reload_temporaries` which is helpful +for that purpose. Consider the following kernel, which attempts to rotate its +input to the right by 1 in parallel: .. doctest:: -- GitLab From 662918f5673b15bb84c5b996a61988553cf8d0d4 Mon Sep 17 00:00:00 2001 From: Matt Wala Date: Fri, 15 Sep 2017 16:44:16 -0500 Subject: [PATCH 4/4] Add a comment about when spill and reload is necessary; add an example about how loopy complains about a missing global barrier. --- doc/tutorial.rst | 44 ++++++++++++++++++++++++-------------------- 1 file changed, 24 insertions(+), 20 deletions(-) diff --git a/doc/tutorial.rst b/doc/tutorial.rst index 13d23dcdd..12c058fb7 100644 --- a/doc/tutorial.rst +++ b/doc/tutorial.rst @@ -1123,21 +1123,12 @@ Barrier insertion By default, :mod:`loopy` inserts local barriers between two instructions when it detects that a dependency involving local memory may occur across work items. To -see this in action, take a look at the section on :ref:`local_temporaries`. In -contrast, :mod:`loopy` will *not* insert global barriers automatically. +see this in action, take a look at the section on :ref:`local_temporaries`. -Barriers may also be inserted manually into the kernel. The syntax for a global -barrier instruction is ``... gbarrier``. The syntax for a local barrier -instruction is ``... lbarrier``. - -Saving temporaries across global barriers -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -When working with global barriers it is necessary to save and reload non-global -temporaries that are live across global barriers. This section presents an -example of how to use :func:`loopy.save_and_reload_temporaries` which is helpful -for that purpose. Consider the following kernel, which attempts to rotate its -input to the right by 1 in parallel: +In contrast, :mod:`loopy` will *not* insert global barriers automatically and +instead will report an error if it detects the need for a global barrier. As an +example, consider the following kernel, which attempts to rotate its input to +the right by 1 in parallel: .. doctest:: @@ -1165,8 +1156,22 @@ this, :mod:`loopy` will complain that global barrier needs to be inserted: ... MissingBarrierError: Dependency 'rotate depends on maketmp' (for variable 'arr') requires synchronization by a global barrier (add a 'no_sync_with' instruction option to state that no synchronization is needed) -To address this we add the ``... gbarrier`` instruction between the pair of -offending instructions. +The syntax for a inserting a global barrier instruction is +``... gbarrier``. :mod:`loopy` also supports manually inserting local +barriers. The syntax for a local barrier instruction is ``... lbarrier``. + +Saving temporaries across global barriers +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +For some platforms (currently only PyOpenCL), :mod:`loopy` implements global +barriers by splitting the kernel into a host side kernel and multiple +device-side kernels. On such platforms, it will be necessary to save non-global +temporaries that are live across kernel calls. This section presents an example +of how to use :func:`loopy.save_and_reload_temporaries` which is helpful for +that purpose. + +Let us start with an example. Consider the kernel from above with a +``... gbarrier`` instruction that has already been inserted. .. doctest:: @@ -1194,10 +1199,9 @@ Here is what happens when we try to generate code for the kernel: ... MissingDefinitionError: temporary variable 'tmp' gets used in subkernel 'rotate_v2_0' without a definition (maybe you forgot to call loopy.save_and_reload_temporaries?) -To understand what is going on, you need to know that :mod:`loopy` implements -global barriers by splitting the kernel into multiple device-side kernels. The -splitting happens when the instruction schedule is generated. To see the -schedule, we must first call :func:`loopy.get_one_scheduled_kernel`: +This happens due to the kernel splitting done by :mod:`loopy`. The splitting +happens when the instruction schedule is generated. To see the schedule, we +should call :func:`loopy.get_one_scheduled_kernel`: >>> knl = lp.get_one_scheduled_kernel(lp.preprocess_kernel(knl)) >>> print(knl) -- GitLab