diff --git a/doc/runtime_memory.rst b/doc/runtime_memory.rst
index f4e01f26642c9cd76153da90bb6dffaed4ecc7d9..cfe41dc565e35d6bcab81de7d07de02b5262af06 100644
--- a/doc/runtime_memory.rst
+++ b/doc/runtime_memory.rst
@@ -116,14 +116,109 @@ by both the host and the device. *Coarse-grain* SVM requires that
 buffers be mapped before being accessed on the host, *fine-grain* SVM
 does away with that requirement.
 
+.. warning::
+
+    Compared to :class:`Buffer`\ s, SVM brings with it a new concern: the
+    synchronization of memory deallocation. Unlike other objects in OpenCL,
+    SVM is represented by a plain (C-language) pointer and thus has no ability for
+    reference counting.
+
+    As a result, it is perfectly legal to allocate a :class:`Buffer`, enqueue an
+    operation on it, and release the buffer, without worrying about whether the
+    operation has completed. The OpenCL implementation will keep the buffer alive
+    until the operation has completed. This is *not* the case with SVM: Unless
+    otherwise specified, memory deallocation is performed immediately when
+    requested, and so SVM will be deallocated whenever the Python
+    garbage collector sees fit, even if the operation has not completed,
+    immediately leading to undefined behavior (i.e., typically, memory corruption and,
+    before too long, a crash).
+
+    Version 2022.2 of PyOpenCL offers substantially improved tools
+    for dealing with this. In particular, all means for allocating SVM
+    allow specifying a :class:`CommandQueue`, so that deallocation
+    is enqueued and performed after previously-enqueued operations
+    have completed.
+
 SVM requires OpenCL 2.0.
 
+.. _opaque-svm:
+
+Opaque and "Wrapped-:mod:`numpy`" Styles of Referencing SVM
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+When trying to pass SVM pointers to functionality in :mod:`pyopencl`,
+two styles are supported:
+
+- First, the opaque style. This style most closely resembles
+  :class:`Buffer`-based allocation available in OpenCL 1.x.
+  SVM pointers are held in opaque "handle" objects such as :class:`SVMAllocation`.
+
+- Second, the wrapped-:mod:`numpy` style. In this case, a :class:`numpy.ndarray`
+  (or another object implementing the  :c:func:`Python buffer protocol
+  <PyObject_GetBuffer>`) serves as the reference to an area of SVM.
+  This style permits using memory areas with :mod:`pyopencl`'s SVM
+  interfaces even if they were allocated outside of :mod:`pyopencl`.
+
+  Since passing a :class:`numpy.ndarray` (or another type of object obeying the
+  buffer interface) already has existing semantics in most settings in
+  :mod:`pyopencl` (such as when passing arguments to a kernel or calling
+  :func:`enqueue_copy`), there exists a wrapper object, :class:`SVM`, that may
+  be "wrapped around" these objects to mark them as SVM.
+
+The commonality between the two styles is that both ultimately implement
+the :class:`SVMPointer` interface, which :mod:`pyopencl` uses to obtain
+the actual SVM pointer.
+
+Note that it is easily possible to obtain a :class:`numpy.ndarray` view of SVM
+areas held in the opaque style, see :attr:`SVMPointer.buf`, permitting
+transitions from opaque to wrapped-:mod:`numpy` style. The opposite transition
+(from wrapped-:mod:`numpy` to opaque) is not necessarily straightforward,
+as it would require "fishing" the opaque SVM handle out of a chain of
+:attr:`numpy.ndarray.base` attributes (or similar, depending on
+the actual object serving as the main SVM reference).
+
+See :ref:`numpy-svm-helpers` for helper functions that ease setting up the
+wrapped-:mod:`numpy` structure.
+
+Wrapped-:mod:`numpy` SVM tends to be a good fit for fine-grain SVM because of
+the ease of direct host-side access, but the creation of the nested structure
+that makes this possible is associated with a certain amount of cost.
+
+By comparison, opaque SVM access tends to be a good fit for coarse-grain
+SVM, because direct host access is not possible without mapping the array
+anyway, and it has lower setup cost. It is of course entirely possible to use
+opaque SVM access with fine-grain SVM.
+
+.. versionchanged:: 2022.2
+
+   This version adds the opaque style of SVM access.
+
+Using SVM with Arrays
+^^^^^^^^^^^^^^^^^^^^^
+
+While all types of SVM can be used as the memory backing
+:class:`pyopencl.array.Array` objects, ensuring that new arrays returned
+by array operations (e.g. arithmetic) also use SVM is easiest to accomplish
+by passing an :class:`~pyopencl.tools.SVMAllocator` (or
+:class:`~pyopencl.tools.SVMPool`) as the *allocator* parameter in functions
+returning new arrays.
+
+SVM Pointers, Allocations, and Maps
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. autoclass:: SVMPointer
+
+.. autoclass:: SVMAllocation
+
 .. autoclass:: SVM
 
 .. autoclass:: SVMMap
 
-Allocating SVM
-^^^^^^^^^^^^^^
+
+.. _numpy-svm-helpers:
+
+Helper functions for :mod:`numpy`-based SVM allocation
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 .. autofunction:: svm_empty
 .. autofunction:: svm_empty_like
@@ -140,11 +235,6 @@ Operations on SVM
 .. autofunction:: enqueue_svm_memfill
 .. autofunction:: enqueue_svm_migratemem
 
-SVM Allocation Holder
-^^^^^^^^^^^^^^^^^^^^^
-
-.. autoclass:: SVMAllocation
-
 Image
 -----
 
@@ -406,3 +496,11 @@ Pipes
 
         See :class:`pipe_info` for values of *param*.
 
+Type aliases
+------------
+
+.. currentmodule:: pyopencl._cl
+
+.. class:: Buffer
+
+   See :class:`pyopencl.Buffer`.
diff --git a/doc/tools.rst b/doc/tools.rst
index 7fdde084ee6be97fc0fb05309927a02ccd8f4107..080d1c89a1d4fe9012e50df24dfd40052ff0ca0d 100644
--- a/doc/tools.rst
+++ b/doc/tools.rst
@@ -1,203 +1,4 @@
 Built-in Utilities
 ==================
 
-.. module:: pyopencl.tools
-
-.. _memory-pools:
-
-Memory Pools
-------------
-
-The constructor :func:`pyopencl.Buffer` can consume a fairly large amount of
-processing time if it is invoked very frequently. For example, code based on
-:class:`pyopencl.array.Array` can easily run into this issue because a
-fresh memory area is allocated for each intermediate result. Memory pools are a
-remedy for this problem based on the observation that often many of the block
-allocations are of the same sizes as previously used ones.
-
-Then, instead of fully returning the memory to the system and incurring the
-associated reallocation overhead, the pool holds on to the memory and uses it
-to satisfy future allocations of similarly-sized blocks. The pool reacts
-appropriately to out-of-memory conditions as long as all memory allocations
-are made through it. Allocations performed from outside of the pool may run
-into spurious out-of-memory conditions due to the pool owning much or all of
-the available memory.
-
-Using :class:`pyopencl.array.Array` instances with a :class:`MemoryPool` is
-not complicated::
-
-    mem_pool = pyopencl.tools.MemoryPool(pyopencl.tools.ImmediateAllocator(queue))
-    a_dev = cl_array.arange(queue, 2000, dtype=np.float32, allocator=mem_pool)
-
-.. class:: PooledBuffer
-
-    An object representing a :class:`MemoryPool`-based allocation of
-    device memory.  Once this object is deleted, its associated device
-    memory is returned to the pool. This supports the same interface
-    as :class:`pyopencl.Buffer`.
-
-.. class:: AllocatorInterface
-
-   An interface implemented by various memory allocation functions
-   in :mod:`pyopencl`.
-
-    .. method:: __call__(size)
-
-        Allocate and return a :class:`pyopencl.Buffer` of the given *size*.
-
-.. class:: DeferredAllocator(context, mem_flags=pyopencl.mem_flags.READ_WRITE)
-
-    *mem_flags* takes its values from :class:`pyopencl.mem_flags` and corresponds
-    to the *flags* argument of :class:`pyopencl.Buffer`. DeferredAllocator
-    has the same semantics as regular OpenCL buffer allocation, i.e. it may
-    promise memory to be available that may (in any call to a buffer-using
-    CL function) turn out to not exist later on. (Allocations in CL are
-    bound to contexts, not devices, and memory availability depends on which
-    device the buffer is used with.)
-
-    Implements :class:`AllocatorInterface`.
-
-    .. versionchanged :: 2013.1
-
-        ``CLAllocator`` was deprecated and replaced
-        by :class:`DeferredAllocator`.
-
-    .. method:: __call__(size)
-
-        Allocate a :class:`pyopencl.Buffer` of the given *size*.
-
-        .. versionchanged :: 2020.2
-
-            The allocator will succeed even for allocations of size zero,
-            returning *None*.
-
-.. class:: ImmediateAllocator(queue, mem_flags=pyopencl.mem_flags.READ_WRITE)
-
-    *mem_flags* takes its values from :class:`pyopencl.mem_flags` and corresponds
-    to the *flags* argument of :class:`pyopencl.Buffer`.
-    :class:`ImmediateAllocator` will attempt to ensure at allocation time that
-    allocated memory is actually available. If no memory is available, an out-of-memory
-    error is reported at allocation time.
-
-    Implements :class:`AllocatorInterface`.
-
-    .. versionadded:: 2013.1
-
-    .. method:: __call__(size)
-
-        Allocate a :class:`pyopencl.Buffer` of the given *size*.
-
-        .. versionchanged :: 2020.2
-
-            The allocator will succeed even for allocations of size zero,
-            returning *None*.
-
-.. class:: MemoryPool(allocator[, leading_bits_in_bin_id])
-
-    A memory pool for OpenCL device memory. *allocator* must be an instance of
-    one of the above classes, and should be an :class:`ImmediateAllocator`.
-    The memory pool assumes that allocation failures are reported
-    by the allocator immediately, and not in the OpenCL-typical
-    deferred manner.
-
-    Implements :class:`AllocatorInterface`.
-
-    .. note::
-
-        The current implementation of the memory pool will retain allocated
-        memory after it is returned by the application and keep it in a bin
-        identified by the leading *leading_bits_in_bin_id* bits of the
-        allocation size. To ensure that allocations within each bin are
-        interchangeable, allocation sizes are rounded up to the largest size
-        that shares the leading bits of the requested allocation size.
-
-        The current default value of *leading_bits_in_bin_id* is
-        four, but this may change in future versions and is not
-        guaranteed.
-
-        *leading_bits_in_bin_id* must be passed by keyword,
-        and its role is purely advisory. It is not guaranteed
-        that future versions of the pool will use the
-        same allocation scheme and/or honor *leading_bits_in_bin_id*.
-
-    .. versionchanged:: 2019.1
-
-        Current bin allocation behavior documented, *leading_bits_in_bin_id*
-        added.
-
-    .. attribute:: held_blocks
-
-        The number of unused blocks being held by this pool.
-
-    .. attribute:: active_blocks
-
-        The number of blocks in active use that have been allocated
-        through this pool.
-
-    .. attribute:: managed_bytes
-
-        "Managed" memory is "active" and "held" memory.
-
-        .. versionadded: 2021.1.2
-
-    .. attribute:: active_bytes
-
-        "Active" bytes are bytes under the control of the application.
-        This may be smaller than the actual allocated size reflected
-        in :attr:`managed_bytes`.
-
-        .. versionadded: 2021.1.2
-
-    .. method:: allocate(size)
-
-        Return a :class:`PooledBuffer` of the given *size*.
-
-    .. method:: __call__(size)
-
-        Synonym for :meth:`allocate` to match the :class:`AllocatorInterface`.
-
-        .. versionadded: 2011.2
-
-    .. method:: free_held
-
-        Free all unused memory that the pool is currently holding.
-
-    .. method:: stop_holding
-
-        Instruct the memory to start immediately freeing memory returned
-        to it, instead of holding it for future allocations.
-        Implicitly calls :meth:`free_held`.
-        This is useful as a cleanup action when a memory pool falls out
-        of use.
-
-CL-Object-dependent Caching
----------------------------
-
-.. autofunction:: first_arg_dependent_memoize
-.. autofunction:: clear_first_arg_caches
-
-Testing
--------
-
-.. function:: pytest_generate_tests_for_pyopencl(metafunc)
-
-    Using the line::
-
-        from pyopencl.tools import pytest_generate_tests_for_pyopencl \
-                as pytest_generate_tests
-
-    in your `pytest <http://pytest.org>`_ test scripts allows you to use the
-    arguments *ctx_factory*, *device*, or *platform* in your test functions,
-    and they will automatically be run for each OpenCL device/platform in the
-    system, as appropriate.
-
-    The following two environment variables are also supported to control
-    device/platform choice::
-
-        PYOPENCL_TEST=0:0,1;intel=i5,i7
-
-Device Characterization
------------------------
-
-.. automodule:: pyopencl.characterize
-    :members:
+.. automodule:: pyopencl.tools
diff --git a/pyopencl/__init__.py b/pyopencl/__init__.py
index fef444ba115037bf9e8ebcacb305326b67134fc7..ab042c0ff4307af732ffc269813d4465b6b553b7 100644
--- a/pyopencl/__init__.py
+++ b/pyopencl/__init__.py
@@ -22,6 +22,7 @@ THE SOFTWARE.
 
 from sys import intern
 from warnings import warn
+from typing import Union, Any, Optional, Sequence
 
 from pyopencl.version import VERSION, VERSION_STATUS, VERSION_TEXT  # noqa
 
@@ -199,11 +200,9 @@ if get_cl_header_version() >= (1, 2):
 
 if get_cl_header_version() >= (2, 0):
     from pyopencl._cl import (  # noqa
-        SVMAllocation,
+        SVMPointer,
         SVM,
-
-        # FIXME
-        #enqueue_svm_migratemem,
+        SVMAllocation,
         )
 
 if _cl.have_gl():
@@ -1124,44 +1123,166 @@ def _add_functionality():
 
     # }}}
 
-    # {{{ SVMAllocation
+    # {{{ SVMPointer
 
     if get_cl_header_version() >= (2, 0):
-        SVMAllocation.__doc__ = """An object whose lifetime is tied to an
-            allocation of shared virtual memory.
+        SVMPointer.__doc__ = """A base class for things that can be passed to
+            functions that allow an SVM pointer, e.g. kernel enqueues and memory
+            copies.
 
-            .. note::
+            Objects of this type cannot currently be directly created or
+            implemented in Python.  To obtain objects implementing this type,
+            consider its subtypes :class:`SVMAllocation` and :class:`SVM`.
 
-                Most likely, you will not want to use this directly, but rather
-                :func:`svm_empty` and related functions which allow access to this
-                functionality using a friendlier, more Pythonic interface.
 
-            .. versionadded:: 2016.2
+            .. property:: svm_ptr
 
-            .. automethod:: __init__(self, ctx, size, alignment, flags=None)
-            .. automethod:: release
-            .. automethod:: enqueue_release
+                Gives the SVM pointer as an :class:`int`.
+
+            .. property:: size
+
+                An :class:`int` denoting the size in bytes, or *None*, if the size
+                of the SVM pointed to is not known.
+
+                *Most* objects of this type (e.g. instances of
+                :class:`SVMAllocation` and :class:`SVM` know their size, so that,
+                for example :class:`enqueue_copy` will automatically copy an entire
+                :class:`SVMAllocation` when a size is not explicitly specified.
+
+            .. automethod:: map
+            .. automethod:: map_ro
+            .. automethod:: map_rw
+            .. automethod:: as_buffer
+            .. property:: buf
+
+                An opaque object implementing the :c:func:`Python buffer protocol
+                <PyObject_GetBuffer>`. It exposes the pointed-to memory as
+                a one-dimensional buffer of bytes, with the size matching
+                :attr:`size`.
+
+                No guarantee is provided that two references to this attribute
+                result in the same object.
             """
 
-    if get_cl_header_version() >= (2, 0):
-        svmallocation_old_init = SVMAllocation.__init__
+    def svmptr_map(self, queue: CommandQueue, *, flags: int, is_blocking: bool =
+                   True, wait_for: Optional[Sequence[Event]] = None,
+                   size: Optional[Event] = None) -> "SVMMap":
+        """
+        :arg is_blocking: If *False*, subsequent code must wait on
+            :attr:`SVMMap.event` in the returned object before accessing the
+            mapped memory.
+        :arg flags: a combination of :class:`pyopencl.map_flags`.
+        :arg size: The size of the map in bytes. If not provided, defaults to
+            :attr:`size`.
 
-    def svmallocation_init(self, ctx, size, alignment, flags, _interface=None):
+        |std-enqueue-blurb|
+        """
+        return SVMMap(self,
+                np.asarray(self.buf),
+                queue,
+                _cl._enqueue_svm_map(queue, is_blocking, flags, self, wait_for,
+                                    size=size))
+
+    def svmptr_map_ro(self, queue: CommandQueue, *, is_blocking: bool = True,
+                      wait_for: Optional[Sequence[Event]] = None,
+                      size: Optional[int] = None) -> "SVMMap":
+        """Like :meth:`map`, but with *flags* set for a read-only map.
+        """
+
+        return self.map(queue, flags=map_flags.READ,
+                is_blocking=is_blocking, wait_for=wait_for, size=size)
+
+    def svmptr_map_rw(self, queue: CommandQueue, *, is_blocking: bool = True,
+                      wait_for: Optional[Sequence[Event]] = None,
+                      size: Optional[int] = None) -> "SVMMap":
+        """Like :meth:`map`, but with *flags* set for a read-only map.
+        """
+
+        return self.map(queue, flags=map_flags.READ | map_flags.WRITE,
+                is_blocking=is_blocking, wait_for=wait_for, size=size)
+
+    def svmptr__enqueue_unmap(self, queue, wait_for=None):
+        return _cl._enqueue_svm_unmap(queue, self, wait_for)
+
+    def svmptr_as_buffer(self, ctx: Context, *, flags: Optional[int] = None,
+                         size: Optional[int] = None) -> Buffer:
         """
         :arg ctx: a :class:`Context`
-        :arg flags: some of :class:`svm_mem_flags`.
+        :arg flags: a combination of :class:`pyopencl.map_flags`, defaults to
+            read-write.
+        :arg size: The size of the map in bytes. If not provided, defaults to
+            :attr:`size`.
+        :returns: a :class:`Buffer` corresponding to *self*.
+
+        The memory referred to by this object must not be freed before
+        the returned :class:`Buffer` is released.
         """
-        svmallocation_old_init(self, ctx, size, alignment, flags)
 
-        # mem_flags.READ_ONLY applies to kernels, not the host
-        read_write = True
-        _interface["data"] = (
-                int(self._ptr_as_int()), not read_write)
+        if flags is None:
+            flags = mem_flags.READ_WRITE | mem_flags.USE_HOST_PTR
+
+        if size is None:
+            size = self.size
 
-        self.__array_interface__ = _interface
+        return Buffer(ctx, flags, size=size, hostbuf=self.buf)
 
     if get_cl_header_version() >= (2, 0):
-        SVMAllocation.__init__ = svmallocation_init
+        SVMPointer.map = svmptr_map
+        SVMPointer.map_ro = svmptr_map_ro
+        SVMPointer.map_rw = svmptr_map_rw
+        SVMPointer._enqueue_unmap = svmptr__enqueue_unmap
+        SVMPointer.as_buffer = svmptr_as_buffer
+
+    # }}}
+
+    # {{{ SVMAllocation
+
+    if get_cl_header_version() >= (2, 0):
+        SVMAllocation.__doc__ = """
+            Is a :class:`SVMPointer`.
+
+            .. versionadded:: 2016.2
+
+            .. automethod:: __init__
+
+                :arg flags: See :class:`svm_mem_flags`.
+                :arg queue: If not specified, the allocation will be freed
+                    eagerly, irrespective of whether pending/enqueued operations
+                    are still using this memory.
+
+                    If specified, deallocation of the memory will be enqueued
+                    with the given queue, and will only be performed
+                    after previously-enqueue operations in the queue have
+                    completed.
+
+                    It is an error to specify an out-of-order queue.
+
+                    .. warning::
+
+                        Not specifying a queue will typically lead to undesired
+                        behavior, including crashes and memory corruption.
+                        See the warning in :ref:`svm`.
+
+            .. automethod:: enqueue_release
+
+                Enqueue the release of this allocation into *queue*.
+                If *queue* is not specified, enqueue the deallocation
+                into the queue provided at allocation time or via
+                :class:`bind_to_queue`.
+
+            .. automethod:: bind_to_queue
+
+                Change the queue used for implicit enqueue of deallocation
+                to *queue*. Sufficient synchronization is ensured by
+                enqueuing a marker into the old queue and waiting on this
+                marker in the new queue.
+
+            .. automethod:: unbind_from_queue
+
+                Configure the allocation to no longer implicitly enqueue
+                memory allocation. If such a queue was previously provided,
+                :meth:`~CommandQueue.finish` is automatically called on it.
+            """
 
     # }}}
 
@@ -1172,23 +1293,14 @@ def _add_functionality():
             (such as a :class:`numpy.ndarray`) as referring to shared virtual
             memory.
 
+            Is a :class:`SVMPointer`, hence objects of this type may be passed
+            to kernel calls and :func:`enqueue_copy`, and all methods declared
+            there are also available there. Note that :meth:`map` differs
+            slightly from :meth:`SVMPointer.map`.
+
             Depending on the features of the OpenCL implementation, the following
             types of objects may be passed to/wrapped in this type:
 
-            *   coarse-grain shared memory as returned by (e.g.) :func:`csvm_empty`
-                for any implementation of OpenCL 2.0.
-
-                This is how coarse-grain SVM may be used from both host and device::
-
-                    svm_ary = cl.SVM(
-                        cl.csvm_empty(ctx, 1000, np.float32, alignment=64))
-                    assert isinstance(svm_ary.mem, np.ndarray)
-
-                    with svm_ary.map_rw(queue) as ary:
-                        ary.fill(17)  # use from host
-
-                    prg.twice(queue, svm_ary.mem.shape, None, svm_ary)
-
             *   fine-grain shared memory as returned by (e.g.) :func:`fsvm_empty`,
                 if the implementation supports fine-grained shared virtual memory.
                 This memory may directly be passed to a kernel::
@@ -1215,10 +1327,28 @@ def _add_functionality():
                     queue.finish() # synchronize
                     print(ary) # access from host
 
-            Objects of this type may be passed to kernel calls and
-            :func:`enqueue_copy`.  Coarse-grain shared-memory *must* be mapped
-            into host address space using :meth:`map` before being accessed
-            through the :mod:`numpy` interface.
+            *   coarse-grain shared memory as returned by (e.g.) :func:`csvm_empty`
+                for any implementation of OpenCL 2.0.
+
+                .. note::
+
+                    Applications making use of coarse-grain SVM may be better
+                    served by opaque-style SVM. See :ref:`opaque-svm`.
+
+                This is how coarse-grain SVM may be used from both host and device::
+
+                    svm_ary = cl.SVM(
+                        cl.csvm_empty(ctx, 1000, np.float32, alignment=64))
+                    assert isinstance(svm_ary.mem, np.ndarray)
+
+                    with svm_ary.map_rw(queue) as ary:
+                        ary.fill(17)  # use from host
+
+                    prg.twice(queue, svm_ary.mem.shape, None, svm_ary)
+
+            Coarse-grain shared-memory *must* be mapped into host address space
+            using :meth:`~SVMPointer.map` before being accessed through the
+            :mod:`numpy` interface.
 
             .. note::
 
@@ -1239,9 +1369,10 @@ def _add_functionality():
             .. automethod:: map
             .. automethod:: map_ro
             .. automethod:: map_rw
-            .. automethod:: as_buffer
             """
 
+    # }}}
+
     if get_cl_header_version() >= (2, 0):
         svm_old_init = SVM.__init__
 
@@ -1255,14 +1386,18 @@ def _add_functionality():
         :arg is_blocking: If *False*, subsequent code must wait on
             :attr:`SVMMap.event` in the returned object before accessing the
             mapped memory.
-        :arg flags: a combination of :class:`pyopencl.map_flags`, defaults to
-            read-write.
+        :arg flags: a combination of :class:`pyopencl.map_flags`.
         :returns: an :class:`SVMMap` instance
 
+        This differs from the inherited :class:`SVMPointer.map` in that no size
+        can be specified, and that :attr:`mem` is the exact array produced
+        when the :class:`SVMMap` is used as a context manager.
+
         |std-enqueue-blurb|
         """
         return SVMMap(
                 self,
+                self.mem,
                 queue,
                 _cl._enqueue_svm_map(queue, is_blocking, flags, self, wait_for))
 
@@ -1281,29 +1416,12 @@ def _add_functionality():
     def svm__enqueue_unmap(self, queue, wait_for=None):
         return _cl._enqueue_svm_unmap(queue, self, wait_for)
 
-    def svm_as_buffer(self, ctx, flags=None):
-        """
-        :arg ctx: a :class:`Context`
-        :arg flags: a combination of :class:`pyopencl.map_flags`, defaults to
-            read-write.
-        :returns: a :class:`Buffer` corresponding to *self*.
-
-        The memory referred to by this object must not be freed before
-        the returned :class:`Buffer` is released.
-        """
-
-        if flags is None:
-            flags = mem_flags.READ_WRITE
-
-        return Buffer(ctx, flags, size=self.mem.nbytes, hostbuf=self.mem)
-
     if get_cl_header_version() >= (2, 0):
         SVM.__init__ = svm_init
         SVM.map = svm_map
         SVM.map_ro = svm_map_ro
         SVM.map_rw = svm_map_rw
         SVM._enqueue_unmap = svm__enqueue_unmap
-        SVM.as_buffer = svm_as_buffer
 
     # }}}
 
@@ -1406,6 +1524,27 @@ _add_functionality()
 # }}}
 
 
+# {{{ _OverriddenArrayInterfaceSVMAllocation
+
+if get_cl_header_version() >= (2, 0):
+    class _OverriddenArrayInterfaceSVMAllocation(SVMAllocation):
+        def __init__(self, ctx, size, alignment, flags, *, _interface,
+                queue=None):
+            """
+            :arg ctx: a :class:`Context`
+            :arg flags: some of :class:`svm_mem_flags`.
+            """
+            super().__init__(ctx, size, alignment, flags, queue)
+
+            # mem_flags.READ_ONLY applies to kernels, not the host
+            read_write = True
+            _interface["data"] = (int(self.svm_ptr), not read_write)
+
+            self.__array_interface__ = _interface
+
+# }}}
+
+
 # {{{ create_some_context
 
 def create_some_context(interactive=None, answers=None):
@@ -1546,19 +1685,24 @@ _csc = create_some_context
 
 class SVMMap:
     """
-    .. attribute:: event
+    Returned by :func:`SVMPointer.map` and :func:`SVM.map`.
+    This class may also be used as a context manager in a ``with`` statement.
+    :meth:`release` will be called upon exit from the ``with`` region.
+    The value returned to the ``as`` part of the context manager is the
+    mapped Python object (e.g. a :mod:`numpy` array).
 
     .. versionadded:: 2016.2
 
+    .. property:: event
+
+        The :class:`Event` returned when mapping the memory.
+
     .. automethod:: release
 
-    This class may also be used as a context manager in a ``with`` statement.
-    :meth:`release` will be called upon exit from the ``with`` region.
-    The value returned to the ``as`` part of the context manager is the
-    mapped Python object (e.g. a :mod:`numpy` array).
     """
-    def __init__(self, svm, queue, event):
+    def __init__(self, svm, array, queue, event):
         self.svm = svm
+        self.array = array
         self.queue = queue
         self.event = event
 
@@ -1567,7 +1711,7 @@ class SVMMap:
             self.release()
 
     def __enter__(self):
-        return self.svm.mem
+        return self.array
 
     def __exit__(self, exc_type, exc_val, exc_tb):
         self.release()
@@ -1712,7 +1856,7 @@ def enqueue_copy(queue, dest, src, **kwargs):
         three or shorter. (mandatory)
 
     .. ------------------------------------------------------------------------
-    .. rubric :: Transfer :class:`SVM`/host ↔ :class:`SVM`/host
+    .. rubric :: Transfer :class:`SVMPointer`/host ↔ :class:`SVMPointer`/host
     .. ------------------------------------------------------------------------
 
     :arg byte_count: (optional) If not specified, defaults to the
@@ -1772,12 +1916,14 @@ def enqueue_copy(queue, dest, src, **kwargs):
         else:
             raise ValueError("invalid dest mem object type")
 
-    elif get_cl_header_version() >= (2, 0) and isinstance(dest, SVM):
+    elif get_cl_header_version() >= (2, 0) and isinstance(dest, SVMPointer):
         # to SVM
-        if not isinstance(src, SVM):
+        if not isinstance(src, SVMPointer):
             src = SVM(src)
 
         is_blocking = kwargs.pop("is_blocking", True)
+        assert kwargs.pop("src_offset", 0) == 0
+        assert kwargs.pop("dest_offset", 0) == 0
         return _cl._enqueue_svm_memcpy(queue, is_blocking, dest, src, **kwargs)
 
     else:
@@ -1803,7 +1949,7 @@ def enqueue_copy(queue, dest, src, **kwargs):
                         queue, src, origin, region, dest, **kwargs)
             else:
                 raise ValueError("invalid src mem object type")
-        elif isinstance(src, SVM):
+        elif isinstance(src, SVMPointer):
             # from svm
             # dest is not a SVM instance, otherwise we'd be in the branch above
             is_blocking = kwargs.pop("is_blocking", True)
@@ -1937,7 +2083,7 @@ def enqueue_fill_buffer(queue, mem, pattern, offset, size, wait_for=None):
 def enqueue_svm_memfill(queue, dest, pattern, byte_count=None, wait_for=None):
     """Fill shared virtual memory with a pattern.
 
-    :arg dest: a Python buffer object, optionally wrapped in an :class:`SVM` object
+    :arg dest: a Python buffer object, or any implementation of :class:`SVMPointer`.
     :arg pattern: a Python buffer object (e.g. a :class:`numpy.ndarray` with the
         fill pattern to be used.
     :arg byte_count: The size of the memory to be fill. Defaults to the
@@ -1948,17 +2094,17 @@ def enqueue_svm_memfill(queue, dest, pattern, byte_count=None, wait_for=None):
     .. versionadded:: 2016.2
     """
 
-    if not isinstance(dest, SVM):
+    if not isinstance(dest, SVMPointer):
         dest = SVM(dest)
 
     return _cl._enqueue_svm_memfill(
-            queue, dest, pattern, byte_count=None, wait_for=None)
+            queue, dest, pattern, byte_count=byte_count, wait_for=wait_for)
 
 
 def enqueue_svm_migratemem(queue, svms, flags, wait_for=None):
     """
     :arg svms: a collection of Python buffer objects (e.g. :mod:`numpy`
-        arrays), optionally wrapped in :class:`SVM` objects.
+        arrays), or any implementation of :class:`SVMPointer`.
     :arg flags: a combination of :class:`mem_migration_flags`
 
     |std-enqueue-blurb|
@@ -1968,15 +2114,10 @@ def enqueue_svm_migratemem(queue, svms, flags, wait_for=None):
     This function requires OpenCL 2.1.
     """
 
-    return _cl._enqueue_svm_migratemem(
-            queue,
-            [svm.mem if isinstance(svm, SVM) else svm
-                for svm in svms],
-            flags,
-            wait_for)
+    return _cl._enqueue_svm_migratemem(queue, svms, flags, wait_for)
 
 
-def svm_empty(ctx, flags, shape, dtype, order="C", alignment=None):
+def svm_empty(ctx, flags, shape, dtype, order="C", alignment=None, queue=None):
     """Allocate an empty :class:`numpy.ndarray` of the given *shape*, *dtype*
     and *order*. (See :func:`numpy.empty` for the meaning of these arguments.)
     The array will be allocated in shared virtual memory belonging
@@ -1994,6 +2135,10 @@ def svm_empty(ctx, flags, shape, dtype, order="C", alignment=None):
     will likely want to wrap the returned array in an :class:`SVM` tag.
 
     .. versionadded:: 2016.2
+
+    .. versionchanged:: 2022.2
+
+        *queue* argument added.
     """
 
     dtype = np.dtype(dtype)
@@ -2040,7 +2185,9 @@ def svm_empty(ctx, flags, shape, dtype, order="C", alignment=None):
     if alignment is None:
         alignment = itemsize
 
-    svm_alloc = SVMAllocation(ctx, nbytes, alignment, flags, _interface=interface)
+    svm_alloc = _OverriddenArrayInterfaceSVMAllocation(
+            ctx, nbytes, alignment, flags, _interface=interface,
+            queue=queue)
     return np.asarray(svm_alloc)
 
 
diff --git a/pyopencl/array.py b/pyopencl/array.py
index 15ed2bbbf281888b8625a19a7ff723c2b54c199c..80b1c61d1396eac1639ee77071cdd0fea66a1d2b 100644
--- a/pyopencl/array.py
+++ b/pyopencl/array.py
@@ -721,9 +721,14 @@ class Array:
                     stacklevel=2)
 
         if self.size:
-            event1 = cl.enqueue_copy(queue or self.queue, self.base_data, ary,
-                    device_offset=self.offset,
-                    is_blocking=not async_)
+            if self.offset:
+                event1 = cl.enqueue_copy(queue or self.queue, self.base_data, ary,
+                        device_offset=self.offset,
+                        is_blocking=not async_)
+            else:
+                event1 = cl.enqueue_copy(queue or self.queue, self.base_data, ary,
+                        is_blocking=not async_)
+
             self.add_event(event1)
 
     def _get(self, queue=None, ary=None, async_=None, **kwargs):
@@ -771,9 +776,14 @@ class Array:
                     "to associate one.")
 
         if self.size:
-            event1 = cl.enqueue_copy(queue, ary, self.base_data,
-                    device_offset=self.offset,
-                    wait_for=self.events, is_blocking=not async_)
+            if self.offset:
+                event1 = cl.enqueue_copy(queue, ary, self.base_data,
+                        device_offset=self.offset,
+                        wait_for=self.events, is_blocking=not async_)
+            else:
+                event1 = cl.enqueue_copy(queue, ary, self.base_data,
+                        wait_for=self.events, is_blocking=not async_)
+
             self.add_event(event1)
         else:
             event1 = None
diff --git a/pyopencl/tools.py b/pyopencl/tools.py
index 27adac75bd2e7c9a355e876bb7912371e57beaf9..fb4a91e14f98d3cde4c6b68ceeee4d44979aa3e8 100644
--- a/pyopencl/tools.py
+++ b/pyopencl/tools.py
@@ -1,4 +1,92 @@
-"""Various helpful bits and pieces without much of a common theme."""
+r"""
+.. _memory-pools:
+
+Memory Pools
+------------
+
+Memory allocation (e.g. in the form of the :func:`pyopencl.Buffer` constructor)
+can be expensive if used frequently. For example, code based on
+:class:`pyopencl.array.Array` can easily run into this issue because a fresh
+memory area is allocated for each intermediate result.  Memory pools are a
+remedy for this problem based on the observation that often many of the block
+allocations are of the same sizes as previously used ones.
+
+Then, instead of fully returning the memory to the system and incurring the
+associated reallocation overhead, the pool holds on to the memory and uses it
+to satisfy future allocations of similarly-sized blocks. The pool reacts
+appropriately to out-of-memory conditions as long as all memory allocations
+are made through it. Allocations performed from outside of the pool may run
+into spurious out-of-memory conditions due to the pool owning much or all of
+the available memory.
+
+There are two flavors of allocators and memory pools:
+
+- :ref:`buf-mempool`
+- :ref:`svm-mempool`
+
+Using :class:`pyopencl.array.Array`\ s can be used with memory pools in a
+straightforward manner::
+
+    mem_pool = pyopencl.tools.MemoryPool(pyopencl.tools.ImmediateAllocator(queue))
+    a_dev = cl_array.arange(queue, 2000, dtype=np.float32, allocator=mem_pool)
+
+Likewise, SVM-based allocators are directly usable with
+:class:`pyopencl.array.Array`.
+
+.. _buf-mempool:
+
+:class:`~pyopencl.Buffer`-based Allocators and Memory Pools
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. autoclass:: PooledBuffer
+
+.. autoclass:: AllocatorBase
+
+.. autoclass:: DeferredAllocator
+
+.. autoclass:: ImmediateAllocator
+
+.. autoclass:: MemoryPool
+
+.. _svm-mempool:
+
+:ref:`SVM <svm>`-Based Allocators and Memory Pools
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+SVM functionality requires OpenCL 2.0.
+
+.. autoclass:: PooledSVM
+
+.. autoclass:: SVMAllocator
+
+.. autoclass:: SVMPool
+
+CL-Object-dependent Caching
+---------------------------
+
+.. autofunction:: first_arg_dependent_memoize
+.. autofunction:: clear_first_arg_caches
+
+Testing
+-------
+
+.. autofunction:: pytest_generate_tests_for_pyopencl
+
+Device Characterization
+-----------------------
+
+.. automodule:: pyopencl.characterize
+    :members:
+
+Type aliases
+------------
+
+.. currentmodule:: pyopencl._cl
+
+.. class:: AllocatorBase
+
+   See :class:`pyopencl.tools.AllocatorBase`.
+"""
 
 
 __copyright__ = "Copyright (C) 2010 Andreas Kloeckner"
@@ -33,7 +121,7 @@ from sys import intern
 
 import numpy as np
 from pytools import memoize, memoize_method
-from pyopencl._cl import bitlog2  # noqa: F401
+from pyopencl._cl import bitlog2, get_cl_header_version  # noqa: F401
 from pytools.persistent_dict import KeyBuilder as KeyBuilderBase
 
 import re
@@ -59,10 +147,293 @@ _register_types()
 # {{{ imported names
 
 from pyopencl._cl import (  # noqa
-        PooledBuffer as PooledBuffer,
-        _tools_DeferredAllocator as DeferredAllocator,
-        _tools_ImmediateAllocator as ImmediateAllocator,
-        MemoryPool as MemoryPool)
+        PooledBuffer, AllocatorBase, DeferredAllocator,
+        ImmediateAllocator, MemoryPool,
+        )
+
+
+if get_cl_header_version() >= (2, 0):
+    from pyopencl._cl import (  # noqa
+            SVMPool,
+            PooledSVM,
+            SVMAllocator,
+            )
+
+# }}}
+
+
+# {{{ monkeypatch docstrings into imported interfaces
+
+_MEMPOOL_IFACE_DOCS = """
+.. note::
+
+    The current implementation of the memory pool will retain allocated
+    memory after it is returned by the application and keep it in a bin
+    identified by the leading *leading_bits_in_bin_id* bits of the
+    allocation size. To ensure that allocations within each bin are
+    interchangeable, allocation sizes are rounded up to the largest size
+    that shares the leading bits of the requested allocation size.
+
+    The current default value of *leading_bits_in_bin_id* is
+    four, but this may change in future versions and is not
+    guaranteed.
+
+    *leading_bits_in_bin_id* must be passed by keyword,
+    and its role is purely advisory. It is not guaranteed
+    that future versions of the pool will use the
+    same allocation scheme and/or honor *leading_bits_in_bin_id*.
+
+.. attribute:: held_blocks
+
+    The number of unused blocks being held by this pool.
+
+.. attribute:: active_blocks
+
+    The number of blocks in active use that have been allocated
+    through this pool.
+
+.. attribute:: managed_bytes
+
+    "Managed" memory is "active" and "held" memory.
+
+    .. versionadded:: 2021.1.2
+
+.. attribute:: active_bytes
+
+    "Active" bytes are bytes under the control of the application.
+    This may be smaller than the actual allocated size reflected
+    in :attr:`managed_bytes`.
+
+    .. versionadded:: 2021.1.2
+
+
+.. method:: free_held
+
+    Free all unused memory that the pool is currently holding.
+
+.. method:: stop_holding
+
+    Instruct the memory to start immediately freeing memory returned
+    to it, instead of holding it for future allocations.
+    Implicitly calls :meth:`free_held`.
+    This is useful as a cleanup action when a memory pool falls out
+    of use.
+"""
+
+
+def _monkeypatch_docstrings():
+
+    PooledBuffer.__doc__ = """
+    An object representing a :class:`MemoryPool`-based allocation of
+    :class:`~pyopencl.Buffer`-style device memory.  Analogous to
+    :class:`~pyopencl.Buffer`, however once this object is deleted, its
+    associated device memory is returned to the pool.
+
+    Is a :class:`pyopencl.MemoryObject`.
+    """
+
+    AllocatorBase.__doc__ = """
+    An interface implemented by various memory allocation functions
+    in :mod:`pyopencl`.
+
+    .. automethod:: __call__
+
+        Allocate and return a :class:`pyopencl.Buffer` of the given *size*.
+    """
+
+    # {{{ DeferredAllocator
+
+    DeferredAllocator.__doc__ = """
+    *mem_flags* takes its values from :class:`pyopencl.mem_flags` and corresponds
+    to the *flags* argument of :class:`pyopencl.Buffer`. DeferredAllocator
+    has the same semantics as regular OpenCL buffer allocation, i.e. it may
+    promise memory to be available that may (in any call to a buffer-using
+    CL function) turn out to not exist later on. (Allocations in CL are
+    bound to contexts, not devices, and memory availability depends on which
+    device the buffer is used with.)
+
+    Implements :class:`AllocatorBase`.
+
+    .. versionchanged :: 2013.1
+
+        ``CLAllocator`` was deprecated and replaced
+        by :class:`DeferredAllocator`.
+
+    .. method::  __init__(context, mem_flags=pyopencl.mem_flags.READ_WRITE)
+
+    .. automethod:: __call__
+
+        Allocate a :class:`pyopencl.Buffer` of the given *size*.
+
+        .. versionchanged :: 2020.2
+
+            The allocator will succeed even for allocations of size zero,
+            returning *None*.
+    """
+
+    # }}}
+
+    # {{{ ImmediateAllocator
+
+    ImmediateAllocator.__doc__ = """
+    *mem_flags* takes its values from :class:`pyopencl.mem_flags` and corresponds
+    to the *flags* argument of :class:`pyopencl.Buffer`.
+    :class:`ImmediateAllocator` will attempt to ensure at allocation time that
+    allocated memory is actually available. If no memory is available, an
+    out-of-memory error is reported at allocation time.
+
+    Implements :class:`AllocatorBase`.
+
+    .. versionadded:: 2013.1
+
+    .. method:: __init__(queue, mem_flags=pyopencl.mem_flags.READ_WRITE)
+
+    .. automethod:: __call__
+
+        Allocate a :class:`pyopencl.Buffer` of the given *size*.
+
+        .. versionchanged :: 2020.2
+
+            The allocator will succeed even for allocations of size zero,
+            returning *None*.
+    """
+
+    # }}}
+
+    # {{{ MemoryPool
+
+    MemoryPool.__doc__ = """
+    A memory pool for OpenCL device memory in :class:`pyopencl.Buffer` form.
+    *allocator* must be an instance of one of the above classes, and should be
+    an :class:`ImmediateAllocator`.  The memory pool assumes that allocation
+    failures are reported by the allocator immediately, and not in the
+    OpenCL-typical deferred manner.
+
+    Implements :class:`AllocatorBase`.
+
+    .. versionchanged:: 2019.1
+
+        Current bin allocation behavior documented, *leading_bits_in_bin_id*
+        added.
+
+    .. automethod:: __init__
+
+    .. automethod:: allocate
+
+        Return a :class:`PooledBuffer` of the given *size*.
+
+    .. automethod:: __call__
+
+        Synonym for :meth:`allocate` to match :class:`AllocatorBase`.
+
+        .. versionadded:: 2011.2
+    """ + _MEMPOOL_IFACE_DOCS
+
+    # }}}
+
+
+_monkeypatch_docstrings()
+
+
+def _monkeypatch_svm_docstrings():
+    # {{{ PooledSVM
+
+    PooledSVM.__doc__ = """
+    An object representing a :class:`SVMPool`-based allocation of
+    :ref:`svm`.  Analogous to :class:`~pyopencl.SVMAllocation`, however once
+    this object is deleted, its associated device memory is returned to the
+    pool from which it came.
+
+    .. versionadded:: 2022.2
+
+    .. note::
+
+        If the :class:`SVMAllocator` for the :class:`SVMPool` that allocated an
+        object of this type is associated with an (in-order)
+        :class:`~pyopencl.CommandQueue`, sufficient synchronization is provided
+        to ensure operations enqueued before deallocation complete before
+        operations from a different use (possibly in a different queue) are
+        permitted to start. This applies when :class:`release` is called and
+        also when the object is freed automatically by the garbage collector.
+
+    Is a :class:`pyopencl.SVMPointer`.
+
+    Supports structural equality and hashing.
+
+    .. automethod:: release
+
+        Return the held memory to the pool. See the note about synchronization
+        behavior during deallocation above.
+
+    .. automethod:: enqueue_release
+
+        Synonymous to :meth;`release`, for consistency with
+        :class:`~pyopencl.SVMAllocation`. Note that, unlike
+        :meth:`pyopencl.SVMAllocation.enqueue_release`, specifying a queue
+        or events to be waited for is not supported.
+
+    .. automethod:: bind_to_queue
+
+        Analogous to :meth:`pyopencl.SVMAllocation.bind_to_queue`.
+
+    .. automethod:: unbind_from_queue
+
+        Analogous to :meth:`pyopencl.SVMAllocation.unbind_from_queue`.
+    """
+
+    # }}}
+
+    # {{{ SVMAllocator
+
+    SVMAllocator.__doc__ = """
+    .. versionadded:: 2022.2
+
+    .. automethod:: __init__
+
+        :arg flags: See :class:`~pyopencl.svm_mem_flags`.
+        :arg queue: If not specified, allocations will be freed
+            eagerly, irrespective of whether pending/enqueued operations
+            are still using the memory.
+
+            If specified, deallocation of memory will be enqueued
+            with the given queue, and will only be performed
+            after previously-enqueue operations in the queue have
+            completed.
+
+            It is an error to specify an out-of-order queue.
+
+            .. warning::
+
+                Not specifying a queue will typically lead to undesired
+                behavior, including crashes and memory corruption.
+                See the warning in :ref:`svm`.
+
+    .. automethod:: __call__
+
+        Return a :class:`~pyopencl.SVMAllocation` of the given *size*.
+    """
+
+    # }}}
+
+    # {{{ SVMPool
+
+    SVMPool.__doc__ = """
+    A memory pool for OpenCL device memory in :ref:`SVM <svm>` form.
+    *allocator* must be an instance of :class:`SVMAllocator`.
+
+    .. versionadded:: 2022.2
+
+    .. automethod:: __init__
+    .. automethod:: __call__
+
+        Return a :class:`PooledSVM` of the given *size*.
+    """ + _MEMPOOL_IFACE_DOCS
+
+    # }}}
+
+
+if get_cl_header_version() >= (2, 0):
+    _monkeypatch_svm_docstrings()
 
 # }}}
 
@@ -310,6 +681,22 @@ def get_pyopencl_fixture_arg_values():
 
 
 def pytest_generate_tests_for_pyopencl(metafunc):
+    """Using the line::
+
+        from pyopencl.tools import pytest_generate_tests_for_pyopencl
+                as pytest_generate_tests
+
+    in your `pytest <http://pytest.org>`_ test scripts allows you to use the
+    arguments *ctx_factory*, *device*, or *platform* in your test functions,
+    and they will automatically be run for each OpenCL device/platform in the
+    system, as appropriate.
+
+    The following two environment variabls is also supported to control
+    device/platform choice::
+
+        PYOPENCL_TEST=0:0,1;intel=i5,i7
+    """
+
     arg_names = get_pyopencl_fixture_arg_names(metafunc)
     if not arg_names:
         return
@@ -605,7 +992,7 @@ def match_dtype_to_c_struct(device, name, dtype, context=None):
     the given *device* to ensure that :mod:`numpy` and C offsets and
     sizes match.)
 
-    .. versionadded: 2013.1
+    .. versionadded:: 2013.1
 
     This example explains the use of this function::
 
diff --git a/src/mempool.hpp b/src/mempool.hpp
index 44f0fd64398509132a1dfef917540a3f8fd6de77..a0eca827e704020dc4248b9b00d064e9041b6993 100644
--- a/src/mempool.hpp
+++ b/src/mempool.hpp
@@ -102,7 +102,7 @@ namespace PYGPU_PACKAGE
       container_t m_container;
       typedef typename container_t::value_type bin_pair_t;
 
-      std::unique_ptr<Allocator> m_allocator;
+      std::shared_ptr<Allocator> m_allocator;
 
       // A held block is one that's been released by the application, but that
       // we are keeping around to dish out again.
@@ -125,8 +125,8 @@ namespace PYGPU_PACKAGE
       unsigned m_leading_bits_in_bin_id;
 
     public:
-      memory_pool(Allocator const &alloc=Allocator(), unsigned leading_bits_in_bin_id=4)
-        : m_allocator(alloc.copy()),
+      memory_pool(std::shared_ptr<Allocator> alloc, unsigned leading_bits_in_bin_id=4)
+        : m_allocator(alloc),
         m_held_blocks(0), m_active_blocks(0),
         m_managed_bytes(0), m_active_bytes(0),
         m_stop_holding(false),
@@ -233,7 +233,8 @@ namespace PYGPU_PACKAGE
             std::cout
               << "[pool] allocation of size " << size << " served from bin " << bin_nr
               << " which contained " << bin.size() << " entries" << std::endl;
-          return pop_block_from_bin(bin, size);
+          return m_allocator->hand_out_existing_block(
+              pop_block_from_bin(bin, size));
         }
 
          size_type alloc_sz = alloc_size(bin_nr);
@@ -256,7 +257,8 @@ namespace PYGPU_PACKAGE
 
         m_allocator->try_release_blocks();
         if (bin.size())
-          return pop_block_from_bin(bin, size);
+          return m_allocator->hand_out_existing_block(
+              pop_block_from_bin(bin, size));
 
         if (m_trace)
           std::cout << "[pool] allocation still OOM after GC" << std::endl;
@@ -282,7 +284,7 @@ namespace PYGPU_PACKAGE
             "failed to free memory for allocation");
       }
 
-      void free(pointer_type p, size_type size)
+      void free(pointer_type &&p, size_type size)
       {
         --m_active_blocks;
         m_active_bytes -= size;
@@ -291,7 +293,7 @@ namespace PYGPU_PACKAGE
         if (!m_stop_holding)
         {
           inc_held_blocks();
-          get_bin(bin_nr).push_back(p);
+          get_bin(bin_nr).push_back(std::move(p));
 
           if (m_trace)
             std::cout << "[pool] block of size " << size << " returned to bin "
@@ -300,7 +302,7 @@ namespace PYGPU_PACKAGE
         }
         else
         {
-          m_allocator->free(p);
+          m_allocator->free(std::move(p));
           m_managed_bytes -= alloc_size(bin_nr);
         }
       }
@@ -313,7 +315,7 @@ namespace PYGPU_PACKAGE
 
           while (bin.size())
           {
-            m_allocator->free(bin.back());
+            m_allocator->free(std::move(bin.back()));
             m_managed_bytes -= alloc_size(bin_pair.first);
             bin.pop_back();
 
@@ -353,7 +355,7 @@ namespace PYGPU_PACKAGE
 
           if (bin.size())
           {
-            m_allocator->free(bin.back());
+            m_allocator->free(std::move(bin.back()));
             m_managed_bytes -= alloc_size(bin_pair.first);
             bin.pop_back();
 
@@ -379,7 +381,7 @@ namespace PYGPU_PACKAGE
 
       pointer_type pop_block_from_bin(bin_t &bin, size_type size)
       {
-        pointer_type result = bin.back();
+        pointer_type result(std::move(bin.back()));
         bin.pop_back();
 
         dec_held_blocks();
@@ -399,7 +401,7 @@ namespace PYGPU_PACKAGE
       typedef typename Pool::pointer_type pointer_type;
       typedef typename Pool::size_type size_type;
 
-    private:
+    protected:
       PYGPU_SHARED_PTR<pool_type> m_pool;
 
       pointer_type m_ptr;
@@ -421,7 +423,7 @@ namespace PYGPU_PACKAGE
       {
         if (m_valid)
         {
-          m_pool->free(m_ptr, m_size);
+          m_pool->free(std::move(m_ptr), m_size);
           m_valid = false;
         }
         else
@@ -435,16 +437,8 @@ namespace PYGPU_PACKAGE
 #endif
               );
       }
-
-      pointer_type ptr() const
-      { return m_ptr; }
-
-      size_type size() const
-      { return m_size; }
   };
 }
 
 
-
-
 #endif
diff --git a/src/wrap_cl.hpp b/src/wrap_cl.hpp
index f7f87a8a7a9d6cde35f3647633083e7f9b8aa02f..5bebef66eef7b96b56ae2aee4242eabd9c685688 100644
--- a/src/wrap_cl.hpp
+++ b/src/wrap_cl.hpp
@@ -227,8 +227,6 @@
     }
 
 
-
-
 #define PYOPENCL_RETRY_IF_MEM_ERROR(OPERATION) \
   { \
     bool failed_with_mem_error = false; \
@@ -258,6 +256,17 @@
     } \
   }
 
+
+#define PYOPENCL_GET_SVM_SIZE(NAME) \
+  size_t NAME##_size; \
+  bool NAME##_has_size = false; \
+  try \
+  { \
+    NAME##_size = NAME.size(); \
+    NAME##_has_size = true; \
+  } \
+  catch (size_not_available)  { }
+
 // }}}
 
 
@@ -3552,11 +3561,26 @@ namespace pyopencl
   // }}}
 
 
-  // {{{ svm
-
 #if PYOPENCL_CL_VERSION >= 0x2000
 
-  class svm_arg_wrapper
+  // {{{ svm pointer
+
+  class size_not_available { };
+
+  class svm_pointer
+  {
+    public:
+      virtual void *svm_ptr() const = 0;
+      // may throw size_not_available
+      virtual size_t size() const = 0;
+  };
+
+  // }}}
+
+
+  // {{{ svm_arg_wrapper
+
+  class svm_arg_wrapper : public svm_pointer
   {
     private:
       void *m_ptr;
@@ -3579,7 +3603,7 @@ namespace pyopencl
         m_size = ward->m_buf.len;
       }
 
-      void *ptr() const
+      void *svm_ptr() const
       {
         return m_ptr;
       }
@@ -3589,17 +3613,34 @@ namespace pyopencl
       }
   };
 
+  // }}}
+
 
-  class svm_allocation : noncopyable
+  // {{{ svm_allocation
+
+  class svm_allocation : public svm_pointer
   {
     private:
       std::shared_ptr<context> m_context;
       void *m_allocation;
+      size_t m_size;
+      command_queue_ref m_queue;
+      // FIXME Should maybe also allow keeping a list of events so that we can
+      // wait for users to finish in the case of out-of-order queues.
 
     public:
-      svm_allocation(std::shared_ptr<context> const &ctx, size_t size, cl_uint alignment, cl_svm_mem_flags flags)
-        : m_context(ctx)
+      svm_allocation(std::shared_ptr<context> const &ctx, size_t size, cl_uint alignment,
+          cl_svm_mem_flags flags, const command_queue *queue = nullptr)
+        : m_context(ctx), m_size(size)
       {
+        if (queue)
+        {
+          m_queue.set(queue->data());
+          if (is_queue_out_of_order(m_queue.data()))
+            throw error("SVMAllocation.__init__", CL_INVALID_VALUE,
+                "supplying an out-of-order queue to SVMAllocation is invalid");
+        }
+
         PYOPENCL_PRINT_CALL_TRACE("clSVMalloc");
         m_allocation = clSVMAlloc(
             ctx->data(),
@@ -3609,6 +3650,25 @@ namespace pyopencl
           throw pyopencl::error("clSVMAlloc", CL_OUT_OF_RESOURCES);
       }
 
+      svm_allocation(std::shared_ptr<context> const &ctx, void *allocation, size_t size,
+           const cl_command_queue queue)
+        : m_context(ctx), m_allocation(allocation), m_size(size)
+      {
+        if (queue)
+        {
+          if (is_queue_out_of_order(queue))
+          {
+            release();
+            throw error("SVMAllocation.__init__", CL_INVALID_VALUE,
+                "supplying an out-of-order queue to SVMAllocation is invalid");
+          }
+          m_queue.set(queue);
+        }
+      }
+
+      svm_allocation(const svm_allocation &) = delete;
+      svm_allocation &operator=(const svm_allocation &) = delete;
+
       ~svm_allocation()
       {
         if (m_allocation)
@@ -3621,36 +3681,62 @@ namespace pyopencl
           throw error("SVMAllocation.release", CL_INVALID_VALUE,
               "trying to double-unref svm allocation");
 
-        clSVMFree(m_context->data(), m_allocation);
+        if (m_queue.is_valid())
+        {
+          PYOPENCL_CALL_GUARDED_CLEANUP(clEnqueueSVMFree, (
+                m_queue.data(), 1, &m_allocation,
+                nullptr, nullptr,
+                0, nullptr, nullptr));
+          m_queue.reset();
+        }
+        else
+        {
+          PYOPENCL_PRINT_CALL_TRACE("clSVMFree");
+          clSVMFree(m_context->data(), m_allocation);
+        }
         m_allocation = nullptr;
       }
 
-      void enqueue_release(command_queue &queue, py::object py_wait_for)
+      event *enqueue_release(command_queue *queue, py::object py_wait_for)
       {
         PYOPENCL_PARSE_WAIT_FOR;
 
         if (!m_allocation)
-          throw error("SVMAllocation.release", CL_INVALID_VALUE,
-              "trying to double-unref svm allocation");
+          throw error("SVMAllocation.enqueue_release", CL_INVALID_VALUE,
+              "trying to enqueue_release on an already-freed allocation");
+
+        cl_command_queue use_queue;
+        if (queue)
+          use_queue = queue->data();
+        else
+        {
+          if (m_queue.is_valid())
+            use_queue = m_queue.data();
+          else
+            throw error("SVMAllocation.enqueue_release", CL_INVALID_VALUE,
+                "no implicit queue available, must be provided explicitly");
+        }
 
         cl_event evt;
 
         PYOPENCL_CALL_GUARDED_CLEANUP(clEnqueueSVMFree, (
-              queue.data(), 1, &m_allocation,
+              use_queue, 1, &m_allocation,
               nullptr, nullptr,
               PYOPENCL_WAITLIST_ARGS, &evt));
 
         m_allocation = nullptr;
+
+        PYOPENCL_RETURN_NEW_EVENT(evt);
       }
 
-      void *ptr() const
+      void *svm_ptr() const
       {
         return m_allocation;
       }
 
-      intptr_t ptr_as_int() const
+      size_t size() const
       {
-        return (intptr_t) m_allocation;
+        return m_size;
       }
 
       bool operator==(svm_allocation const &other) const
@@ -3662,22 +3748,99 @@ namespace pyopencl
       {
         return m_allocation != other.m_allocation;
       }
+
+      void bind_to_queue(command_queue const &queue)
+      {
+        if (is_queue_out_of_order(queue.data()))
+          throw error("SVMAllocation.bind_to_queue", CL_INVALID_VALUE,
+              "supplying an out-of-order queue to SVMAllocation is invalid");
+
+        if (m_queue.is_valid())
+        {
+          if (m_queue.data() != queue.data())
+          {
+            // make sure synchronization promises stay valid in new queue
+            cl_event evt;
+
+            PYOPENCL_CALL_GUARDED(clEnqueueMarker, (m_queue.data(), &evt));
+            PYOPENCL_CALL_GUARDED(clEnqueueMarkerWithWaitList,
+                (queue.data(), 1, &evt, nullptr));
+          }
+        }
+
+        m_queue.set(queue.data());
+      }
+
+      void unbind_from_queue()
+      {
+        if (m_queue.is_valid())
+          PYOPENCL_CALL_GUARDED_THREADED(clFinish, (m_queue.data()));
+
+        m_queue.reset();
+      }
   };
 
+  // }}}
+
+
+  // {{{ svm operations
 
   inline
   event *enqueue_svm_memcpy(
       command_queue &cq,
       cl_bool is_blocking,
-      svm_arg_wrapper &dst, svm_arg_wrapper &src,
-      py::object py_wait_for
+      svm_pointer &dst, svm_pointer &src,
+      py::object py_wait_for,
+      py::object byte_count_py
       )
   {
     PYOPENCL_PARSE_WAIT_FOR;
 
-    if (src.size() != dst.size())
+    // {{{ process size
+
+    PYOPENCL_GET_SVM_SIZE(src);
+    PYOPENCL_GET_SVM_SIZE(dst);
+
+    size_t size;
+    bool have_size = false;
+
+    if (src_has_size)
+    {
+      size = src_size;
+      have_size = true;
+    }
+    if (dst_has_size)
+    {
+      if (have_size)
+      {
+        if (!byte_count_py.is_none())
+          size = std::min(size, dst_size);
+        else if (size != dst_size)
+          throw error("_enqueue_svm_memcpy", CL_INVALID_VALUE,
+              "sizes of source and destination buffer do not match");
+      }
+      else
+      {
+        size = dst_size;
+        have_size = true;
+      }
+    }
+
+    if (!byte_count_py.is_none())
+    {
+      size_t byte_count = byte_count_py.cast<size_t>();
+      if (have_size && byte_count > size)
+        throw error("_enqueue_svm_memcpy", CL_INVALID_VALUE,
+            "specified byte_count larger than size of source or destination buffers");
+      size = byte_count;
+      have_size = true;
+    }
+
+    if (!have_size)
       throw error("_enqueue_svm_memcpy", CL_INVALID_VALUE,
-          "sizes of source and destination buffer do not match");
+          "size not passed and could not be determined");
+
+    // }}}
 
     cl_event evt;
     PYOPENCL_CALL_GUARDED(
@@ -3685,8 +3848,8 @@ namespace pyopencl
         (
           cq.data(),
           is_blocking,
-          dst.ptr(), src.ptr(),
-          dst.size(),
+          dst.svm_ptr(), src.svm_ptr(),
+          size,
           PYOPENCL_WAITLIST_ARGS,
           &evt
         ));
@@ -3698,7 +3861,7 @@ namespace pyopencl
   inline
   event *enqueue_svm_memfill(
       command_queue &cq,
-      svm_arg_wrapper &dst, py::object py_pattern,
+      svm_pointer &dst, py::object py_pattern,
       py::object byte_count,
       py::object py_wait_for
       )
@@ -3715,18 +3878,41 @@ namespace pyopencl
     pattern_ptr = pattern_ward->m_buf.buf;
     pattern_len = pattern_ward->m_buf.len;
 
-    size_t fill_size = dst.size();
+    // {{{ process size
+
+    PYOPENCL_GET_SVM_SIZE(dst);
+
+    size_t size;
+    bool have_size = false;
+    if (dst_has_size)
+    {
+      size = dst_size;
+      have_size = true;
+    }
     if (!byte_count.is_none())
-      fill_size = py::cast<size_t>(byte_count);
+    {
+      size_t user_size = py::cast<size_t>(byte_count);
+      if (have_size && user_size > size)
+        throw error("enqueue_svm_memfill", CL_INVALID_VALUE,
+            "byte_count too large for specified SVM buffer");
+    }
+
+    if (!have_size)
+    {
+      throw error("enqueue_svm_memfill", CL_INVALID_VALUE,
+          "byte_count not passed and could not be determined");
+    }
+
+    // }}}
 
     cl_event evt;
     PYOPENCL_CALL_GUARDED(
         clEnqueueSVMMemFill,
         (
           cq.data(),
-          dst.ptr(), pattern_ptr,
+          dst.svm_ptr(), pattern_ptr,
           pattern_len,
-          fill_size,
+          size,
           PYOPENCL_WAITLIST_ARGS,
           &evt
         ));
@@ -3740,12 +3926,40 @@ namespace pyopencl
       command_queue &cq,
       cl_bool is_blocking,
       cl_map_flags flags,
-      svm_arg_wrapper &svm,
-      py::object py_wait_for
+      svm_pointer &svm,
+      py::object py_wait_for,
+      py::object user_size_py
       )
   {
     PYOPENCL_PARSE_WAIT_FOR;
 
+    // {{{ process size
+
+    PYOPENCL_GET_SVM_SIZE(svm);
+
+    size_t size;
+    bool have_size = false;
+    if (svm_has_size)
+    {
+      size = svm_size;
+      have_size = true;
+    }
+    if (!user_size_py.is_none())
+    {
+      size_t user_size = py::cast<size_t>(user_size_py);
+      if (have_size && user_size > size)
+        throw error("enqueue_svm_memfill", CL_INVALID_VALUE,
+            "user-provided size too large for specified SVM buffer");
+    }
+
+    if (!have_size)
+    {
+      throw error("enqueue_svm_mem_map", CL_INVALID_VALUE,
+          "size not passed and could not be determined");
+    }
+
+    // }}}
+
     cl_event evt;
     PYOPENCL_CALL_GUARDED(
         clEnqueueSVMMap,
@@ -3753,7 +3967,7 @@ namespace pyopencl
           cq.data(),
           is_blocking,
           flags,
-          svm.ptr(), svm.size(),
+          svm.svm_ptr(), size,
           PYOPENCL_WAITLIST_ARGS,
           &evt
         ));
@@ -3765,7 +3979,7 @@ namespace pyopencl
   inline
   event *enqueue_svm_unmap(
       command_queue &cq,
-      svm_arg_wrapper &svm,
+      svm_pointer &svm,
       py::object py_wait_for
       )
   {
@@ -3776,7 +3990,7 @@ namespace pyopencl
         clEnqueueSVMUnmap,
         (
           cq.data(),
-          svm.ptr(),
+          svm.svm_ptr(),
           PYOPENCL_WAITLIST_ARGS,
           &evt
         ));
@@ -3802,9 +4016,9 @@ namespace pyopencl
 
     for (py::handle py_svm: svms)
     {
-      svm_arg_wrapper &svm(py::cast<svm_arg_wrapper &>(py_svm));
+      svm_pointer &svm(py::cast<svm_pointer &>(py_svm));
 
-      svm_pointers.push_back(svm.ptr());
+      svm_pointers.push_back(svm.svm_ptr());
       sizes.push_back(svm.size());
     }
 
@@ -4597,10 +4811,10 @@ namespace pyopencl
       }
 
 #if PYOPENCL_CL_VERSION >= 0x2000
-      void set_arg_svm(cl_uint arg_index, svm_arg_wrapper const &wrp)
+      void set_arg_svm(cl_uint arg_index, svm_pointer const &wrp)
       {
         PYOPENCL_CALL_GUARDED(clSetKernelArgSVMPointer,
-            (m_kernel, arg_index, wrp.ptr()));
+            (m_kernel, arg_index, wrp.svm_ptr()));
       }
 #endif
 
@@ -4622,7 +4836,7 @@ namespace pyopencl
 #if PYOPENCL_CL_VERSION >= 0x2000
         try
         {
-          set_arg_svm(arg_index, arg.cast<svm_arg_wrapper const &>());
+          set_arg_svm(arg_index, arg.cast<svm_pointer const &>());
           return;
         }
         catch (py::cast_error &) { }
diff --git a/src/wrap_cl_part_2.cpp b/src/wrap_cl_part_2.cpp
index 0c9a0d1b1eb168b631d30aa5965d6a9f58e6d105..33cc6ce30219b597c9d9698cc6b1d36b2ff35c21 100644
--- a/src/wrap_cl_part_2.cpp
+++ b/src/wrap_cl_part_2.cpp
@@ -24,6 +24,7 @@
 // OTHER DEALINGS IN THE SOFTWARE.
 
 
+#include <memory>
 #define NO_IMPORT_ARRAY
 #define PY_ARRAY_UNIQUE_SYMBOL pyopencl_ARRAY_API
 
@@ -64,6 +65,22 @@ namespace pyopencl {
   }
 
 #endif
+
+#if PYOPENCL_CL_VERSION >= 0x2000
+  class svm_pointer_as_buffer
+  {
+    private:
+      svm_pointer &m_ptr;
+
+    public:
+      svm_pointer_as_buffer(svm_pointer &ptr)
+        : m_ptr(ptr)
+      { }
+
+      svm_pointer &ptr() const
+      { return m_ptr; }
+  };
+#endif
 }
 
 
@@ -292,37 +309,119 @@ void pyopencl_expose_part_2(py::module &m)
 
   // }}}
 
-  // {{{ svm
+  // {{{ svm_pointer
 
 #if PYOPENCL_CL_VERSION >= 0x2000
+  {
+    typedef svm_pointer cls;
+    py::class_<cls>(m, "SVMPointer", py::dynamic_attr())
+      // For consistency, it may seem appropriate to use int_ptr here, but
+      // that would work on both buffers and SVM, and passing a buffer pointer to
+      // a kernel is going to lead to a bad time.
+      .def_property_readonly("svm_ptr",
+          [](cls &self) { return (intptr_t) self.svm_ptr(); })
+      .def_property_readonly("size", [](cls &self) -> py::object
+          {
+            try
+            {
+              return py::cast(self.size());
+            }
+            catch (size_not_available)
+            {
+              return py::none();
+            }
+          })
+      .def_property_readonly("buf", [](cls &self) -> svm_pointer_as_buffer * {
+            return new svm_pointer_as_buffer(self);
+          }, py::return_value_policy::reference_internal)
+      ;
+  }
+
+  {
+    typedef svm_pointer_as_buffer cls;
+    py::class_<cls>(m, "_SVMPointerAsBuffer", pybind11::buffer_protocol())
+      .def_buffer([](cls &self) -> pybind11::buffer_info
+          {
+            size_t size;
+            try
+            {
+              size = self.ptr().size();
+            }
+            catch (size_not_available)
+            {
+              throw pyopencl::error("SVMPointer buffer protocol", CL_INVALID_VALUE,
+                  "size of SVM is not known");
+            }
+            return pybind11::buffer_info(
+                // Pointer to buffer
+                self.ptr().svm_ptr(),
+                // Size of one scalar
+                sizeof(unsigned char),
+                // Python struct-style format descriptor
+                pybind11::format_descriptor<unsigned char>::format(),
+                // Number of dimensions
+                1,
+                // Buffer dimensions
+                { size },
+                // Strides (in bytes) for each index
+                { sizeof(unsigned char) }
+                );
+          })
+    ;
+  }
+
+  // }}}
+
+  // {{{ svm_arg_wrapper
+
   {
     typedef svm_arg_wrapper cls;
-    py::class_<cls>(m, "SVM", py::dynamic_attr())
+    py::class_<cls, svm_pointer>(m, "SVM", py::dynamic_attr())
       .def(py::init<py::object>())
       ;
   }
 
+  // }}}
+
+  // {{{ svm_allocation
+
   {
     typedef svm_allocation cls;
-    py::class_<cls>(m, "SVMAllocation", py::dynamic_attr())
-      .def(py::init<std::shared_ptr<context>, size_t, cl_uint, cl_svm_mem_flags>())
+    py::class_<cls, svm_pointer>(m, "SVMAllocation", py::dynamic_attr())
+      .def(py::init<std::shared_ptr<context>, size_t, cl_uint, cl_svm_mem_flags, const command_queue *>(),
+          py::arg("context"),
+          py::arg("size"),
+          py::arg("alignment"),
+          py::arg("flags"),
+          py::arg("queue").none(true)=py::none()
+          )
       .DEF_SIMPLE_METHOD(release)
       .def("enqueue_release", &cls::enqueue_release,
           ":returns: a :class:`pyopencl.Event`\n\n"
-          "|std-enqueue-blurb|")
-      .def("_ptr_as_int", &cls::ptr_as_int)
+          "|std-enqueue-blurb|",
+          py::arg("queue").none(true)=py::none(),
+          py::arg("wait_for").none(true)=py::none()
+          )
       .def(py::self == py::self)
       .def(py::self != py::self)
-      .def("__hash__", &cls::ptr_as_int)
+      .def("__hash__", [](cls &self) { return (intptr_t) self.svm_ptr(); })
+      .def("bind_to_queue", &cls::bind_to_queue,
+          py::arg("queue"))
+      .DEF_SIMPLE_METHOD(unbind_from_queue)
       ;
   }
 
+  // }}}
+
+  // {{{ svm operations
+
   m.def("_enqueue_svm_memcpy", enqueue_svm_memcpy,
       py::arg("queue"),
       py::arg("is_blocking"),
       py::arg("dst"),
       py::arg("src"),
-      py::arg("wait_for")=py::none()
+      py::arg("wait_for")=py::none(),
+      py::arg("byte_count")=py::none()
       );
 
   m.def("_enqueue_svm_memfill", enqueue_svm_memfill,
@@ -338,7 +437,8 @@ void pyopencl_expose_part_2(py::module &m)
       py::arg("is_blocking"),
       py::arg("flags"),
       py::arg("svm"),
-      py::arg("wait_for")=py::none()
+      py::arg("wait_for")=py::none(),
+      py::arg("size")=py::none()
       );
 
   m.def("_enqueue_svm_unmap", enqueue_svm_unmap,
diff --git a/src/wrap_mempool.cpp b/src/wrap_mempool.cpp
index 8514f1fab8ef105478ab1bc448cb6f0c7b54e1ca..3ba6fb607ce1d4e0a603ee08e25831f419484f53 100644
--- a/src/wrap_mempool.cpp
+++ b/src/wrap_mempool.cpp
@@ -40,46 +40,53 @@
 
 
 
-namespace
-{
+namespace pyopencl {
+  // {{{ test_allocator
+
   class test_allocator
   {
     public:
       typedef void *pointer_type;
       typedef size_t size_type;
 
-      virtual test_allocator *copy() const
+      bool is_deferred() const
       {
-        return new test_allocator();
+        return false;
       }
 
-      virtual bool is_deferred() const
+      pointer_type allocate(size_type s)
       {
-        return false;
+        return nullptr;
       }
-      virtual pointer_type allocate(size_type s)
+
+      pointer_type hand_out_existing_block(pointer_type &&p)
       {
-        return nullptr;
+        return p;
       }
 
-      void free(pointer_type p)
+      ~test_allocator()
+      { }
+
+      void free(pointer_type &&p)
       { }
 
       void try_release_blocks()
       { }
   };
 
+  // }}}
+
 
-  // {{{ cl allocators
+  // {{{ buffer allocators
 
-  class cl_allocator_base
+  class buffer_allocator_base
   {
     protected:
       std::shared_ptr<pyopencl::context> m_context;
       cl_mem_flags m_flags;
 
     public:
-      cl_allocator_base(std::shared_ptr<pyopencl::context> const &ctx,
+      buffer_allocator_base(std::shared_ptr<pyopencl::context> const &ctx,
           cl_mem_flags flags=CL_MEM_READ_WRITE)
         : m_context(ctx), m_flags(flags)
       {
@@ -88,21 +95,25 @@ namespace
               "cannot specify USE_HOST_PTR or COPY_HOST_PTR flags");
       }
 
-      cl_allocator_base(cl_allocator_base const &src)
+      buffer_allocator_base(buffer_allocator_base const &src)
       : m_context(src.m_context), m_flags(src.m_flags)
       { }
 
-      virtual ~cl_allocator_base()
+      virtual ~buffer_allocator_base()
       { }
 
       typedef cl_mem pointer_type;
       typedef size_t size_type;
 
-      virtual cl_allocator_base *copy() const = 0;
       virtual bool is_deferred() const = 0;
       virtual pointer_type allocate(size_type s) = 0;
 
-      void free(pointer_type p)
+      pointer_type hand_out_existing_block(pointer_type &&p)
+      {
+        return p;
+      }
+
+      void free(pointer_type &&p)
       {
         PYOPENCL_CALL_GUARDED(clReleaseMemObject, (p));
       }
@@ -113,22 +124,18 @@ namespace
       }
   };
 
-  class cl_deferred_allocator : public cl_allocator_base
+
+  class deferred_buffer_allocator : public buffer_allocator_base
   {
     private:
-      typedef cl_allocator_base super;
+      typedef buffer_allocator_base super;
 
     public:
-      cl_deferred_allocator(std::shared_ptr<pyopencl::context> const &ctx,
+      deferred_buffer_allocator(std::shared_ptr<pyopencl::context> const &ctx,
           cl_mem_flags flags=CL_MEM_READ_WRITE)
         : super(ctx, flags)
       { }
 
-      cl_allocator_base *copy() const
-      {
-        return new cl_deferred_allocator(*this);
-      }
-
       bool is_deferred() const
       { return true; }
 
@@ -143,28 +150,23 @@ namespace
 
   const unsigned zero = 0;
 
-  class cl_immediate_allocator : public cl_allocator_base
+  class immediate_buffer_allocator : public buffer_allocator_base
   {
     private:
-      typedef cl_allocator_base super;
+      typedef buffer_allocator_base super;
       pyopencl::command_queue m_queue;
 
     public:
-      cl_immediate_allocator(pyopencl::command_queue &queue,
+      immediate_buffer_allocator(pyopencl::command_queue &queue,
           cl_mem_flags flags=CL_MEM_READ_WRITE)
         : super(std::shared_ptr<pyopencl::context>(queue.get_context()), flags),
         m_queue(queue.data(), /*retain*/ true)
       { }
 
-      cl_immediate_allocator(cl_immediate_allocator const &src)
+      immediate_buffer_allocator(immediate_buffer_allocator const &src)
         : super(src), m_queue(src.m_queue)
       { }
 
-      cl_allocator_base *copy() const
-      {
-        return new cl_immediate_allocator(*this);
-      }
-
       bool is_deferred() const
       { return false; }
 
@@ -215,10 +217,42 @@ namespace
   // }}}
 
 
-  // {{{ allocator_call
+  // {{{ pooled_buffer
+
+  class pooled_buffer
+    : public pyopencl::pooled_allocation<pyopencl::memory_pool<buffer_allocator_base> >,
+    public pyopencl::memory_object_holder
+  {
+    private:
+      typedef
+        pyopencl::pooled_allocation<pyopencl::memory_pool<buffer_allocator_base> >
+        super;
+
+    public:
+      pooled_buffer(
+          std::shared_ptr<super::pool_type> p, super::size_type s)
+        : super(p, s)
+      { }
+
+      virtual ~pooled_buffer()
+      { }
+
+      const super::pointer_type data() const
+      { return m_ptr; }
+
+      size_t size() const
+      {
+        return m_size;
+      }
+  };
+
+  // }}}
+
+
+  // {{{ allocate_from_buffer_allocator
 
   inline
-  pyopencl::buffer *allocator_call(cl_allocator_base &alloc, size_t size)
+  buffer *allocate_from_buffer_allocator(buffer_allocator_base &alloc, size_t size)
   {
     cl_mem mem;
     int try_count = 0;
@@ -263,45 +297,249 @@ namespace
   // }}}
 
 
-  // {{{ pooled_buffer
+  // {{{ allocate_from_buffer_pool
 
-  class pooled_buffer
-    : public pyopencl::pooled_allocation<pyopencl::memory_pool<cl_allocator_base> >,
-    public pyopencl::memory_object_holder
+  pooled_buffer *allocate_from_buffer_pool(
+      std::shared_ptr<memory_pool<buffer_allocator_base> > pool,
+      memory_pool<buffer_allocator_base>::size_type sz)
+  {
+    return new pooled_buffer(pool, sz);
+  }
+
+  // }}}
+
+
+#if PYOPENCL_CL_VERSION >= 0x2000
+
+  struct svm_held_pointer
+  {
+    void *ptr;
+    pyopencl::command_queue_ref queue;
+  };
+
+
+  // {{{ svm allocator
+
+  class svm_allocator
+  {
+    public:
+      typedef svm_held_pointer pointer_type;
+      typedef size_t size_type;
+
+    protected:
+      std::shared_ptr<pyopencl::context> m_context;
+      cl_uint m_alignment;
+      cl_svm_mem_flags m_flags;
+      pyopencl::command_queue_ref m_queue;
+
+    public:
+      svm_allocator(std::shared_ptr<pyopencl::context> const &ctx,
+          cl_uint alignment=0, cl_svm_mem_flags flags=CL_MEM_READ_WRITE,
+          pyopencl::command_queue *queue=nullptr)
+        : m_context(ctx), m_alignment(alignment), m_flags(flags)
+      {
+        if (queue)
+          m_queue.set(queue->data());
+      }
+
+      svm_allocator(svm_allocator const &src)
+      : m_context(src.m_context), m_alignment(src.m_alignment),
+      m_flags(src.m_flags)
+      { }
+
+      ~svm_allocator()
+      { }
+
+      bool is_deferred() const
+      {
+        // According to experiments with the Nvidia implementation (and based
+        // on my reading of the CL spec), clSVMalloc will return an error
+        // immedaitely upon being out of memory.  Therefore the
+        // immediate/deferred split on the buffer side is not needed here.
+        // -AK, 2022-09-07
+
+        return false;
+      }
+
+      std::shared_ptr<pyopencl::context> context() const
+      {
+        return m_context;
+      }
+
+      pointer_type allocate(size_type size)
+      {
+        if (size == 0)
+          return { nullptr, nullptr };
+
+        PYOPENCL_PRINT_CALL_TRACE("clSVMalloc");
+        return {
+          clSVMAlloc(m_context->data(), m_flags, size, m_alignment),
+          pyopencl::command_queue_ref(m_queue.is_valid() ? m_queue.data() : nullptr)
+        };
+      }
+
+      pointer_type hand_out_existing_block(pointer_type &&p)
+      {
+        if (m_queue.is_valid())
+        {
+          if (p.queue.is_valid())
+          {
+            if (p.queue.data() != m_queue.data())
+            {
+              // make sure synchronization promises stay valid in new queue
+              cl_event evt;
+
+              PYOPENCL_CALL_GUARDED(clEnqueueMarker, (p.queue.data(), &evt));
+              PYOPENCL_CALL_GUARDED(clEnqueueMarkerWithWaitList,
+                  (m_queue.data(), 1, &evt, nullptr));
+            }
+          }
+          p.queue.set(m_queue.data());
+        }
+        else
+        {
+          if (p.queue.is_valid())
+          {
+            PYOPENCL_CALL_GUARDED_THREADED(clFinish, (p.queue.data()));
+            p.queue.reset();
+          }
+        }
+
+        return std::move(p);
+      }
+
+      void free(pointer_type &&p)
+      {
+        if (p.queue.is_valid())
+        {
+          PYOPENCL_CALL_GUARDED_CLEANUP(clEnqueueSVMFree, (
+                p.queue.data(), 1, &p.ptr,
+                nullptr, nullptr,
+                0, nullptr, nullptr));
+          p.queue.reset();
+        }
+        else
+        {
+          PYOPENCL_PRINT_CALL_TRACE("clSVMFree");
+          clSVMFree(m_context->data(), p.ptr);
+        }
+      }
+
+      void try_release_blocks()
+      {
+        pyopencl::run_python_gc();
+      }
+  };
+
+  // }}}
+
+
+  // {{{ pooled_svm
+
+  class pooled_svm
+    : public pyopencl::pooled_allocation<pyopencl::memory_pool<svm_allocator>>,
+    public pyopencl::svm_pointer
   {
     private:
       typedef
-        pyopencl::pooled_allocation<pyopencl::memory_pool<cl_allocator_base> >
+        pyopencl::pooled_allocation<pyopencl::memory_pool<svm_allocator>>
         super;
 
     public:
-      pooled_buffer(
+      pooled_svm(
           std::shared_ptr<super::pool_type> p, super::size_type s)
         : super(p, s)
       { }
 
-      const super::pointer_type data() const
-      { return ptr(); }
+      void *svm_ptr() const
+      { return m_ptr.ptr; }
+
+      size_t size() const
+      { return m_size; }
+
+      void bind_to_queue(pyopencl::command_queue const &queue)
+      {
+        if (pyopencl::is_queue_out_of_order(queue.data()))
+          throw pyopencl::error("PooledSVM.bind_to_queue", CL_INVALID_VALUE,
+              "supplying an out-of-order queue to SVMAllocation is invalid");
+
+        if (m_ptr.queue.is_valid())
+        {
+          if (m_ptr.queue.data() != queue.data())
+          {
+            // make sure synchronization promises stay valid in new queue
+            cl_event evt;
+
+            PYOPENCL_CALL_GUARDED(clEnqueueMarker, (m_ptr.queue.data(), &evt));
+            PYOPENCL_CALL_GUARDED(clEnqueueMarkerWithWaitList,
+                (queue.data(), 1, &evt, nullptr));
+          }
+        }
+
+        m_ptr.queue.set(queue.data());
+      }
+
+      void unbind_from_queue()
+      {
+        if (m_ptr.queue.is_valid())
+          PYOPENCL_CALL_GUARDED_THREADED(clFinish, (m_ptr.queue.data()));
+
+        m_ptr.queue.reset();
+      }
   };
 
   // }}}
 
 
-  // {{{{ device_pool_allocate
+  // {{{ svm_allocator_call
 
-  pooled_buffer *device_pool_allocate(
-      std::shared_ptr<pyopencl::memory_pool<cl_allocator_base> > pool,
-      pyopencl::memory_pool<cl_allocator_base>::size_type sz)
+  inline
+  pyopencl::svm_allocation *svm_allocator_call(svm_allocator &alloc, size_t size)
   {
-    return new pooled_buffer(pool, sz);
+    int try_count = 0;
+    while (true)
+    {
+      try
+      {
+        svm_held_pointer mem(alloc.allocate(size));
+        if (mem.queue.is_valid())
+          return new pyopencl::svm_allocation(
+              alloc.context(), mem.ptr, size, mem.queue.data());
+        else
+          return new pyopencl::svm_allocation(
+              alloc.context(), mem.ptr, size, nullptr);
+      }
+      catch (pyopencl::error &e)
+      {
+        if (!e.is_out_of_memory())
+          throw;
+        if (++try_count == 2)
+          throw;
+      }
+
+      alloc.try_release_blocks();
+    }
   }
 
   // }}}
 
 
+  // {{{ allocate_from_svm_ppol
+
+  pooled_svm *allocate_from_svm_ppol(
+      std::shared_ptr<pyopencl::memory_pool<svm_allocator> > pool,
+      pyopencl::memory_pool<svm_allocator>::size_type sz)
+  {
+    return new pooled_svm(pool, sz);
+  }
+
+  // }}}
 
+#endif
+}
 
 
+namespace {
   template<class Wrapper>
   void expose_memory_pool(Wrapper &wrapper)
   {
@@ -315,6 +553,9 @@ namespace
       .DEF_SIMPLE_METHOD(alloc_size)
       .DEF_SIMPLE_METHOD(free_held)
       .DEF_SIMPLE_METHOD(stop_holding)
+
+      // undoc for now
+      .def("_set_trace", &cls::set_trace)
       ;
   }
 }
@@ -327,22 +568,24 @@ void pyopencl_expose_mempool(py::module &m)
   m.def("bitlog2", pyopencl::bitlog2);
 
   {
-    typedef cl_allocator_base cls;
-    py::class_<cls /*, boost::noncopyable */> wrapper(
-        m, "_tools_AllocatorBase"/*, py::no_init */);
+    typedef pyopencl::buffer_allocator_base cls;
+    py::class_<cls, std::shared_ptr<cls>> wrapper(m, "AllocatorBase");
     wrapper
-      .def("__call__", allocator_call)
+      .def("__call__", pyopencl::allocate_from_buffer_allocator, py::arg("size"))
       ;
 
   }
 
   {
-    typedef pyopencl::memory_pool<test_allocator> cls;
+    typedef pyopencl::memory_pool<pyopencl::test_allocator> cls;
 
     py::class_<cls, std::shared_ptr<cls>> wrapper( m, "_TestMemoryPool");
     wrapper
       .def(py::init([](unsigned leading_bits_in_bin_id)
-            { return new cls(test_allocator(), leading_bits_in_bin_id); }),
+            { return new cls(
+                std::shared_ptr<pyopencl::test_allocator>(
+                  new pyopencl::test_allocator()),
+                leading_bits_in_bin_id); }),
           py::arg("leading_bits_in_bin_id")=4
           )
       .def("allocate", [](std::shared_ptr<cls> pool, cls::size_type sz)
@@ -356,9 +599,9 @@ void pyopencl_expose_mempool(py::module &m)
   }
 
   {
-    typedef cl_deferred_allocator cls;
-    py::class_<cls, cl_allocator_base> wrapper(
-        m, "_tools_DeferredAllocator");
+    typedef pyopencl::deferred_buffer_allocator cls;
+    py::class_<cls, pyopencl::buffer_allocator_base, std::shared_ptr<cls>> wrapper(
+        m, "DeferredAllocator");
     wrapper
       .def(py::init<
           std::shared_ptr<pyopencl::context> const &>())
@@ -370,9 +613,9 @@ void pyopencl_expose_mempool(py::module &m)
   }
 
   {
-    typedef cl_immediate_allocator cls;
-    py::class_<cls, cl_allocator_base> wrapper(
-        m, "_tools_ImmediateAllocator");
+    typedef pyopencl::immediate_buffer_allocator cls;
+    py::class_<cls, pyopencl::buffer_allocator_base, std::shared_ptr<cls>> wrapper(
+        m, "ImmediateAllocator");
     wrapper
       .def(py::init<pyopencl::command_queue &>())
       .def(py::init<pyopencl::command_queue &, cl_mem_flags>(),
@@ -381,33 +624,77 @@ void pyopencl_expose_mempool(py::module &m)
   }
 
   {
-    typedef pyopencl::memory_pool<cl_allocator_base> cls;
+    typedef pyopencl::pooled_buffer cls;
+    py::class_<cls, pyopencl::memory_object_holder>(m, "PooledBuffer")
+      .def("release", &cls::free)
 
-    py::class_<
-      cls, /* boost::noncopyable, */
-      std::shared_ptr<cls>> wrapper( m, "MemoryPool");
+      .def("bind_to_queue", [](cls &self, pyopencl::command_queue &queue) { /* no-op */ })
+      .def("unbind_from_queue", [](cls &self) { /* no-op */ })
+      ;
+  }
+
+  {
+    typedef pyopencl::memory_pool<pyopencl::buffer_allocator_base> cls;
+
+    py::class_<cls, std::shared_ptr<cls>> wrapper( m, "MemoryPool");
     wrapper
-      .def(py::init<cl_allocator_base const &, unsigned>(),
+      .def(py::init<std::shared_ptr<pyopencl::buffer_allocator_base>, unsigned>(),
           py::arg("allocator"),
           py::arg("leading_bits_in_bin_id")=4
           )
-      .def("allocate", device_pool_allocate)
-      .def("__call__", device_pool_allocate)
-      // undoc for now
-      .DEF_SIMPLE_METHOD(set_trace)
+      .def("allocate", pyopencl::allocate_from_buffer_pool, py::arg("size"))
+      .def("__call__", pyopencl::allocate_from_buffer_pool, py::arg("size"))
       ;
 
     expose_memory_pool(wrapper);
   }
 
+#if PYOPENCL_CL_VERSION >= 0x2000
+  {
+    typedef pyopencl::svm_allocator cls;
+    py::class_<cls, std::shared_ptr<cls>> wrapper(m, "SVMAllocator");
+    wrapper
+      .def(py::init<std::shared_ptr<pyopencl::context>  const &, cl_uint, cl_uint, pyopencl::command_queue *>(),
+          py::arg("context"),
+          py::kw_only(),
+          py::arg("alignment")=0,
+          py::arg("flags")=CL_MEM_READ_WRITE,
+          py::arg("queue").none(true)=nullptr
+          )
+      .def("__call__", pyopencl::svm_allocator_call, py::arg("size"))
+      ;
+  }
+
   {
-    typedef pooled_buffer cls;
-    py::class_<cls, /* boost::noncopyable, */
-      pyopencl::memory_object_holder>(
-          m, "PooledBuffer"/* , py::no_init */)
+    typedef pyopencl::pooled_svm cls;
+    py::class_<cls, pyopencl::svm_pointer>(m, "PooledSVM")
       .def("release", &cls::free)
+      .def("enqueue_release", &cls::free)
+      .def("__eq__", [](const cls &self, const cls &other)
+          { return self.svm_ptr() == other.svm_ptr(); })
+      .def("__hash__", [](cls &self) { return (intptr_t) self.svm_ptr(); })
+      .DEF_SIMPLE_METHOD(bind_to_queue)
+      .DEF_SIMPLE_METHOD(unbind_from_queue)
       ;
   }
+
+  {
+    typedef pyopencl::memory_pool<pyopencl::svm_allocator> cls;
+
+    py::class_<cls, std::shared_ptr<cls>> wrapper( m, "SVMPool");
+    wrapper
+      .def(py::init<std::shared_ptr<pyopencl::svm_allocator>, unsigned>(),
+          py::arg("allocator"),
+          py::kw_only(),
+          py::arg("leading_bits_in_bin_id")=4
+          )
+      .def("__call__", pyopencl::allocate_from_svm_ppol, py::arg("size"))
+      ;
+
+    expose_memory_pool(wrapper);
+  }
+
+#endif
 }
 
 // vim: foldmethod=marker