Public API

API discovery

In many applications it would be enough to use dynamic module attributes to get an API object:

from grunnur import cuda_api
from grunnur import opencl_api
from grunnur import any_api

For a finer programmatic control one can use the methods of the API class:

class grunnur.API

A generalized GPGPU API.

classmethod all_available()List[API]

Returns a list of API objects for which backends are available.

classmethod all_by_shortcut(shortcut: Optional[str] = None)List[API]

If shortcut is a string, returns a list of one API object whose id attribute has its shortcut attribute equal to it (or raises an error if it was not found, or its backend is not available).

If shortcut is None, returns a list of all available API objects.

Parameters

shortcut – an API shortcut to match.

classmethod from_api_id(api_id: grunnur.adapter_base.APIID)API

Creates an API object out of an identifier.

Parameters

api_id – API identifier.

id: grunnur.adapter_base.APIID

This API’s ID.

property platforms

A list of this API’s Platform objects.

shortcut: str

A shortcut for this API (to use in all_by_shortcut(), usually coming from some kind of a CLI). Equal to id.shortcut.

class grunnur.adapter_base.APIID

An ID of an API object.

shortcut: str

This API’s shortcut.

grunnur.cuda_api_id()grunnur.adapter_base.APIID

Returns the identifier of CUDA API.

grunnur.opencl_api_id()grunnur.adapter_base.APIID

Returns the identifier of OpenCL API.

grunnur.all_api_ids()List[grunnur.adapter_base.APIID]

Returns a list of identifiers for all APIs available.

Platforms

A platform is an OpenCL term, but we use it for CUDA API as well for the sake of uniformity. Naturally, there will always be a single (dummy) platform in CUDA.

class grunnur.Platform

A generalized GPGPU platform.

classmethod all(api: API)List[Platform]

Returns a list of platforms available for the given API.

Parameters

api – the API to search in.

classmethod all_by_masks(api: API, include_masks: Optional[Sequence[str]] = None, exclude_masks: Optional[Sequence[str]] = None)List[Platform]

Returns a list of all platforms with names satisfying the given criteria.

Parameters
  • api – the API to search in.

  • include_masks – a list of strings (treated as regexes), one of which must match with the platform name.

  • exclude_masks – a list of strings (treated as regexes), neither of which must match with the platform name.

classmethod from_backend_platform(obj)Platform

Wraps a backend platform object into a Grunnur platform object.

classmethod from_index(api: API, platform_idx: int)Platform

Creates a platform based on its index in the list returned by the API.

Parameters
  • api – the API to search in.

  • platform_idx – the target platform’s index.

api: API

The API object this platform belongs to.

property devices

A list of this device’s Device objects.

name: str

The platform’s name.

vendor: str

The platform’s vendor.

version: str

The platform’s version.

Devices

class grunnur.Device

A generalized GPGPU device.

classmethod all(platform: Platform)List[Device]

Returns a list of devices available for the given platform.

Parameters

platform – the platform to search in.

classmethod all_by_masks(platform: Platform, include_masks: Optional[Sequence[str]] = None, exclude_masks: Optional[Sequence[str]] = None, unique_only: bool = False, include_pure_parallel_devices: bool = False)List[Device]

Returns a list of all devices satisfying the given criteria.

Parameters
  • platform – the platform to search in.

  • include_masks – a list of strings (treated as regexes), one of which must match with the device name.

  • exclude_masks – a list of strings (treated as regexes), neither of which must match with the device name.

  • unique_only – if True, only return devices with unique names.

  • include_pure_parallel_devices – if True, include devices with params.max_total_local_size equal to 1.

classmethod from_backend_device(obj)Device

Wraps a backend device object into a Grunnur device object.

classmethod from_index(platform: Platform, device_idx: int)Device

Creates a device based on its index in the list returned by the API.

Parameters
  • platform – the API to search in.

  • device_idx – the target device’s index.

name: str

This device’s name.

property params

Returns a DeviceParameters object associated with this device.

platform: Platform

The Platform object this device belongs to.

class grunnur.adapter_base.DeviceParameters

An object containing device’s specifications.

abstract property compute_units

The number of multiprocessors (CUDA)/compute units (OpenCL) for the device.

abstract property local_mem_banks

The number of independent channels for shared (CUDA)/local (OpenCL) memory, which can be used from one warp without request serialization.

abstract property local_mem_size

The size of shared (CUDA)/local (OpenCL) memory (in bytes).

abstract property max_local_sizes

The maximum number of threads in one block (CUDA), or work items in one work group (OpenCL) for each of the available dimensions.

abstract property max_num_groups

The maximum number of blocks (CUDA)/work groups (OpenCL) for each of the available dimensions.

abstract property max_total_local_size

The maximum total number of threads in one block (CUDA), or work items in one work group (OpenCL).

abstract property type

Device type.

abstract property warp_size

The number of threads (CUDA)/work items (OpenCL) that are executed synchronously (within one multiprocessor/compute unit).

class grunnur.adapter_base.DeviceType

An enum representing a device’s type.

CPU = 1

CPU type

GPU = 2

GPU type

Device discovery

grunnur.platforms_and_devices_by_mask(api, quantity: Optional[int] = 1, platform_include_masks: Optional[Sequence[str]] = None, platform_exclude_masks: Optional[Sequence[str]] = None, device_include_masks: Optional[Sequence[str]] = None, device_exclude_masks: Optional[Sequence[str]] = None, unique_devices_only: bool = False, include_pure_parallel_devices: bool = False)List[Tuple[Platform, List[Device]]]

Returns all tuples (platform, list of devices) where the platform name and device names satisfy the given criteria, and there are at least quantity devices in the list.

Parameters
grunnur.select_devices(api, interactive: bool = False, quantity: Optional[int] = 1, **device_filters)List[Device]

Using the results from platforms_and_devices_by_mask(), either lets the user select the devices (from the ones matching the criteria) interactively, or takes the first matching list of quantity devices.

Parameters

Contexts

class grunnur.Context

GPGPU context.

deactivate()

CUDA API only: deactivates this context, popping all the CUDA context objects from the stack.

classmethod from_backend_contexts(backend_contexts, take_ownership: bool = False)Context

Creates a context from a single or several backend device contexts. If take_ownership is True, this object will be responsible for the lifetime of backend context objects (important for CUDA backend).

classmethod from_backend_devices(backend_devices)Context

Creates a context from a single or several backend device objects.

classmethod from_criteria(api: API, interactive: bool = False, devices_num: Optional[int] = 1, **device_filters)Context

Finds devices matching the given criteria and creates a Context object out of them.

Parameters
classmethod from_devices(devices: Union[Device, Iterable[Device]])Context

Creates a context from a device or an iterable of devices.

Parameters

devices – one or several devices to use.

api: API

The API this context is based on.

devices: Tuple[Device]

Devices in this context.

platform: Platform

The platform this context is based on.

Queues

class grunnur.Queue(context: Context, device_idx: Optional[int] = None)

A queue on a single device.

Parameters
  • context – a context on which to create a queue.

  • device_idx – device index in the context on which to create a queue. If there is more than one device in the context, it must be specified.

synchronize()

Blocks until sub-queues on all devices are empty.

context: Context

This queue’s context.

device: Device

Device object this queue operates on.

device_idx: int

Device index this queue operates on.

class grunnur.MultiQueue(context: Context, queues: Optional[Iterable[Queue]] = None)

A queue on multiple devices.

Parameters
  • context – a context on which to create a queue.

  • queues – single-device queues (must belong to distinct devices).

classmethod on_device_idxs(context: Context, device_idxs: Iterable[int])MultiQueue

Creates a queue from provided device indexes (in the context).

synchronize()

Blocks until queues on all devices are empty.

context: Context

This queue’s context.

device_idxs: Set[int]

Device indices (in the context) this queue operates on.

devices: Dict[int, Device]

Device objects associated with device indices.

queues: Dict[int, Queue]

Single-device queues associated with device indices.

Buffers and arrays

class grunnur.Buffer

A memory buffer on device.

classmethod allocate(context: Context, size: int, device_idx: Optional[int] = None)Buffer

Allocate a buffer of size bytes.

Parameters
  • context – the context to use.

  • size – the buffer’s size in bytes.

  • device_idx – the device to allocate on (can be omitted in a single-device context).

get(queue: Queue, host_array: numpy.ndarray, async_: bool = False)

Copy the contents of the buffer to the host array.

Parameters
  • queue – the queue to use for the transfer.

  • host_array – the destination array.

  • async – if True, the transfer is performed asynchronously.

get_sub_region(origin: int, size: int)Buffer

Return a buffer object describing a subregion of this buffer.

Parameters
  • origin – the offset of the subregion.

  • size – the size of the subregion.

set(queue: Queue, buf: Union[numpy.ndarray, Buffer], no_async: bool = False)

Copy the contents of the host array or another buffer to this buffer.

Parameters
  • queue – the queue to use for the transfer.

  • buf – the source - numpy array or a Buffer object.

  • no_async – if True, the transfer blocks until completion.

context: Context

Context this buffer is allocated on.

device_idx: int

The index of the device this buffer is allocated on.

property offset

Offset of this buffer (in bytes) from the beginning of the physical allocation it resides in.

property size

This buffer’s size (in bytes).

class grunnur.Array

Array on a single device.

__getitem__(slices)Array

Returns a view of this array.

classmethod empty(context: Context, shape: Sequence[int], dtype: numpy.dtype, allocator: Optional[Callable[[int, int], Buffer]] = None, device_idx: Optional[int] = None)Array

Creates an empty array.

Parameters
  • shape – array shape.

  • dtype – array data type.

  • allocator – an optional callable taking two integer arguments (buffer size in bytes, and the device to allocate it on) and returning a Buffer object.

  • device_idx – the index of the device on which to allocate the array.

classmethod from_host(queue: Queue, host_arr: numpy.ndarray)Array

Creates an array object from a host array.

Parameters
  • queue – the queue to use for the transfer.

  • host_arr – the source array.

get(queue: Queue, dest: Optional[numpy.ndarray] = None, async_: bool = False)numpy.ndarray

Copies the contents of the array to the host array and returns it.

Parameters
  • queue – the queue to use for the transfer.

  • dest – the destination array. If None, the target array is created.

  • async – if True, the transfer is performed asynchronously.

set(queue: Queue, array: Union[numpy.ndarray, Array], no_async: bool = False)

Copies the contents of the host array to the array.

Parameters
  • queue – the queue to use for the transfer.

  • array – the source array.

  • no_async – if True, the transfer blocks until completion.

context: Context

Context this array is allocated on.

dtype: numpy.dtype

Array item data type.

shape: Tuple[int, ]

Array shape.

strides: Tuple[int, ]

Array strides.

class grunnur.array.BaseSplay

Base class for splay strategies for MultiArray.

abstract __call__(arr: ArrayLike, devices: Dict[int, grunnur.Device])Dict[int, ArrayLike]

Creates a dictionary of views of an array-like object for each of the given devices.

Parameters
  • arr – an array-like object.

  • devices – a dictionary of device indices matched to device objects.

ArrayLike

The type of an array-like object (the one having a shape and supporting views via __getitem__())

alias of TypeVar(‘ArrayLike’)

class grunnur.MultiArray

An array on multiple devices.

class CloneSplay

Copies the given array to each device.

class EqualSplay

Splays the given array equally between the devices using the outermost dimension. The outermost dimension should be larger or equal to the number of devices.

classmethod empty(context: Context, shape: Sequence[int], dtype: numpy.dtype, allocator: Optional[Callable[[int, int], Buffer]] = None, device_idxs: Optional[Iterable[int]] = None, splay: Optional[grunnur.array.BaseSplay] = None)MultiArray

Creates an empty array.

Parameters
  • shape – array shape.

  • dtype – array data type.

  • allocator – an optional callable taking two integer arguments (buffer size in bytes, and the device to allocate it on) and returning a Buffer object.

  • device_idx – the index of the device on which to allocate the array.

  • splay – the splay strategy (if None, an EqualSplay object is used).

classmethod from_host(mqueue: MultiQueue, host_arr: numpy.ndarray, splay: Optional[grunnur.array.BaseSplay] = None)MultiArray

Creates an array object from a host array.

Parameters
  • mqueue – the queue to use for the transfer.

  • host_arr – the source array.

  • splay – the splay strategy (if None, an EqualSplay object is used).

get(mqueue: MultiQueue, dest: Optional[numpy.ndarray] = None, async_: bool = False)numpy.ndarray

Copies the contents of the array to the host array and returns it.

Parameters
  • mqueue – the queue to use for the transfer.

  • dest – the destination array. If None, the target array is created.

  • async – if True, the transfer is performed asynchronously.

set(mqueue: MultiQueue, array: Union[numpy.ndarray, MultiArray], no_async: bool = False)

Copies the contents of the host array to the array.

Parameters
  • mqueue – the queue to use for the transfer.

  • array – the source array.

  • no_async – if True, the transfer blocks until completion.

context: Context

Context this array is allocated on.

dtype: numpy.dtype

Array item data type.

shape: Tuple[int, ]

Array shape.

shapes: Dict[int, Tuple[int, ]]

Sub-array shapes matched to device indices.

Programs and kernels

class grunnur.Program(context: Context, template_src: Union[str, Callable[[], str], DefTemplate, Snippet], device_idxs: Optional[Sequence[int]] = None, no_prelude: bool = False, fast_math: bool = False, render_args: Union[List, Tuple] = [], render_globals: Dict = {}, compiler_options: Iterable[str] = [], keep: bool = False, constant_arrays: Mapping[str, Tuple[int, numpy.dtype]] = {})

A compiled program on device(s).

Parameters
  • context – context to compile the program on.

  • template_src – a string with the source code, or a Mako template source to render.

  • device_idxs – a list of device numbers to compile on. If None, compile on all context’s devices.

  • no_prelude – do not add prelude to the rendered source.

  • fast_math – compile using fast (but less accurate) math functions.

  • render_args – a list of positional args to pass to the template.

  • render_globals – a dictionary of globals to pass to the template.

  • compiler_options – a list of options to pass to the backend compiler.

  • keep – keep the intermediate files in a temporary directory.

  • constant_arrays – (CUDA only) a dictionary name: (size, dtype) of global constant arrays to be declared in the program.

set_constant_array(queue: Queue, name: str, arr: Union[Array, numpy.ndarray])

Uploads a constant array to the context’s devices (CUDA only).

Parameters
  • queue – the queue to use for the transfer.

  • name – the name of the constant array symbol in the code.

  • arr – either a device or a host array.

context: Context

The context this program was compiled for.

kernel: grunnur.program.KernelHub

An object whose attributes are Kernel objects with the corresponding names.

sources: Dict[int, str]

Source files used for each device.

class grunnur.program.KernelHub

An object providing access to the host program’s kernels.

__getattr__(kernel_name: str)Kernel

Returns a Kernel object for a function (CUDA)/kernel (OpenCL) with the name kernel_name.

class grunnur.program.Kernel

A kernel compiled for multiple devices.

__call__(queue: Union[grunnur.Queue, grunnur.MultiQueue], global_size: Union[int, Sequence[int], Dict[int, Union[int, Sequence[int]]]], local_size: Union[int, Sequence[int], None, Dict[int, Optional[Union[int, Sequence[int]]]]] = None, *args, **kwds)

A shortcut for Kernel.prepare() and subsequent PreparedKernel.__call__(). See their doc entries for details.

prepare(global_size: Union[int, Sequence[int], Dict[int, Union[int, Sequence[int]]]], local_size: Union[int, Sequence[int], None, Dict[int, Optional[Union[int, Sequence[int]]]]] = None)PreparedKernel

Prepares the kernel for execution.

If local_size or global_size are integer, they will be treated as 1-tuples.

One can pass specific global and local sizes for each device using dictionaries keyed with device indices. This achieves another purpose: the kernel will only be prepared for those devices, and not for all devices available in the context.

Parameters
  • global_size – the total number of threads (CUDA)/work items (OpenCL) in each dimension (column-major). Note that there may be a maximum size in each dimension as well as the maximum number of dimensions. See DeviceParameters for details.

  • local_size – the number of threads in a block (CUDA)/work items in a work group (OpenCL) in each dimension (column-major). If None, it will be chosen automatically.

property max_total_local_sizes

The maximum possible number of threads in a block (CUDA)/work items in a work group (OpenCL) for this kernel.

class grunnur.program.PreparedKernel

A kernel specialized for execution on a set of devices with all possible preparations and checks performed.

__call__(queue: Union[grunnur.Queue, grunnur.MultiQueue], *args, **kwds)

Enqueues the kernel on the devices in the given queue. The kernel must have been prepared for all of these devices.

If an argument is a Array or Buffer object, it must belong to the device on which the kernel is being executed (so queue must only have one device).

If an argument is a MultiArray, it should have subarrays on all the devices from the given queue.

If an argument is a numpy scalar, it will be passed to the kernel directly.

If an argument is a integer-keyed dict, its values corresponding to the device indices the kernel is executed on will be passed as kernel arguments.

Parameters
  • args – kernel arguments.

  • kwds – backend-specific keyword parameters.

Returns

a list of Event objects for enqueued kernels in case of PyOpenCL.

Static kernels

class grunnur.StaticKernel(context: Context, template_src: Union[str, Callable[[], str], DefTemplate, Snippet], name: str, global_size: Union[int, Sequence[int], Dict[int, Union[int, Sequence[int]]]], local_size: Union[int, Sequence[int], None, Dict[int, Optional[Union[int, Sequence[int]]]]] = None, render_globals: Dict = {}, constant_arrays: Mapping[str, Tuple[int, numpy.dtype]] = {}, **kwds)

An object containing a GPU kernel with fixed call sizes.

The globals for the source template will contain an object with the name static of the type VsizeModules containing the id/size functions to be used instead of regular ones.

Parameters
  • context – context to compile the kernel on.

  • template_src – a string with the source code, or a Mako template source to render.

  • name – the kernel’s name.

  • global_size – see prepare().

  • local_size – see prepare().

  • render_globals – a dictionary of globals to pass to the template.

  • constant_arrays – (CUDA only) a dictionary name: (size, dtype) of global constant arrays to be declared in the program.

__call__(queue, *args)

Execute the kernel. In case of the OpenCL backend, returns a pyopencl.Event object.

Parameters
set_constant_array(queue: Queue, name: str, arr: Union[Array, numpy.ndarray])

Uploads a constant array to the context’s devices (CUDA only).

Parameters
  • queue – the queue to use for the transfer.

  • name – the name of the constant array symbol in the code.

  • arr – either a device or a host array.

queue: Queue

The queue this static kernel was compiled and prepared for.

sources: Dict[int, str]

Source files used for each device.

class grunnur.vsize.VsizeModules(local_id, local_size, group_id, num_groups, global_id, global_size, global_flat_id, global_flat_size, begin)

A collection of modules passed to grunnur.StaticKernel. Should be used instead of regular group/thread id functions.

begin: grunnur.Module

Provides the statement ${begin} that should be used at the start of a static kernel function.

global_flat_id: grunnur.Module

Provides the function VSIZE_T ${global_flat_id}() returning the global id of the current thread with all dimensions flattened.

global_flat_size: grunnur.Module

Provides the function VSIZE_T ${global_flat_size}(). returning the global size of with all dimensions flattened.

global_id: grunnur.Module

Provides the function VSIZE_T ${global_id}(int dim) returning the global id of the current thread.

global_size: grunnur.Module

Provides the function VSIZE_T ${global_size}(int dim) returning the global size along dimension dim.

group_id: grunnur.Module

Provides the function VSIZE_T ${group_id}(int dim) returning the group id of the current thread.

local_id: grunnur.Module

Provides the function VSIZE_T ${local_id}(int dim) returning the local id of the current thread.

local_size: grunnur.Module

Provides the function VSIZE_T ${local_size}(int dim) returning the size of the current group.

num_groups: grunnur.Module

Provides the function VSIZE_T ${num_groups}(int dim) returning the number of groups in dimension dim.

Utilities

class grunnur.Template(mako_template: mako.template.Template)

A wrapper for mako Template objects.

classmethod from_associated_file(filename: str)Template

Returns a Template object created from the file which has the same name as filename and the extension .mako. Typically used in computation modules as Template.from_associated_file(__file__).

classmethod from_string(template_source: str)

Returns a Template object created from source.

get_def(name: str)DefTemplate

Returns the template def with the name name.

class grunnur.DefTemplate(name: str, mako_def_template: mako.template.DefTemplate, source: str)

A wrapper for Mako DefTemplate objects.

classmethod from_callable(name: str, callable_obj: Callable[[], str])DefTemplate

Creates a template def from a callable returning a string. The parameter list of the callable is used to create the pararameter list of the resulting template def; the callable should return the body of a Mako template def regardless of the arguments it receives.

classmethod from_string(name: str, argnames: Iterable[str], source: str)DefTemplate

Creates a template def from a string with its body and a list of argument names.

render(*args, **globals_)str

Renders the template def with given arguments and globals.

class grunnur.RenderError(exception: Exception, args: tuple, globals_: dict, source: str)

A custom wrapper for Mako template render errors, to facilitate debugging.

exception: Exception

The original exception thrown by Mako’s render().

globals: dict

The globals used to render the template.

source: str

The source of the template.

class grunnur.Snippet(template: DefTemplate, render_globals: Mapping = {})

Contains a source snippet - a template function that will be rendered in place, with possible context that can include other Snippet or Module objects.

Creates a snippet out of a prepared template.

Parameters
  • template

  • render_globals

classmethod from_callable(callable_obj: Callable[[], str], name: str = '_snippet', render_globals: Mapping = {})Snippet

Creates a snippet from a callable returning a string. The parameter list of the callable is used to create the pararameter list of the resulting template def; the callable should return the body of a Mako template def regardless of the arguments it receives.

Parameters
  • callable_obj – a callable returning the template source.

  • name – the snippet’s name (will simplify debugging)

  • render_globals – a dictionary of “globals” to be used when rendering the template.

classmethod from_string(source: str, name: str = '_snippet', render_globals: Mapping = {})Snippet

Creates a snippet from a template source, treated as a body of a template def with no arguments.

Parameters
  • source – a string with the template source.

  • name – the snippet’s name (will simplify debugging)

  • render_globals – a dictionary of “globals” to be used when rendering the template.

class grunnur.Module(template: DefTemplate, render_globals: Mapping = {})

Contains a source module - a template function that will be rendered at root level, and the place where it was called will receive its unique identifier (prefix), which is used to prefix all module’s functions, types and macros in the global namespace.

Creates a module out of a prepared template.

Parameters
  • template

  • render_globals

classmethod from_callable(callable_obj: Callable[[], str], name: str = '_module', render_globals: Mapping = {})Module

Creates a module from a callable returning a string. The parameter list of the callable is used to create the pararameter list of the resulting template def; the callable should return the body of a Mako template def regardless of the arguments it receives.

The prefix will be passed as the first argument to the template def on render.

Parameters
  • callable_obj – a callable returning the template source.

  • name – the module’s name (will simplify debugging)

  • render_globals – a dictionary of “globals” to be used when rendering the template.

classmethod from_string(source: str, name: str = '_module', render_globals: Mapping = {})Module

Creates a module from a template source, treated as a body of a template def with a single argument (prefix).

Parameters
  • source – a string with the template source.

  • name – the module’s name (will simplify debugging)

  • render_globals – a dictionary of “globals” to be used when rendering the template.

Data type utilities

C interop

grunnur.dtypes.ctype(dtype: numpy.dtype)Union[str, grunnur.Module]

Returns an object that can be passed as a global to Program() and used to render a C equivalent of the given numpy dtype. If there is a built-in C equivalent, the object is just a string with the type name; otherwise it is a Module object containing the corresponding struct declaration.

Note

If dtype is a struct type, it needs to be aligned (see ctype_struct() and align()).

Parameters

dtype

grunnur.dtypes.ctype_struct(dtype: Union[Type, numpy.dtype], ignore_alignment: bool = False)grunnur.Module

For a struct type, returns a Module object with the typedef of a struct corresponding to the given dtype (with its name set to the module prefix).

The structure definition includes the alignment required to produce field offsets specified in dtype; therefore, dtype must be either a simple type, or have proper offsets and dtypes (the ones that can be reporoduced in C using explicit alignment attributes, but without additional padding) and the attribute isalignedstruct == True. An aligned dtype can be produced either by standard means (aligned flag in numpy.dtype constructor and explicit offsets and itemsizes), or created out of an arbitrary dtype with the help of align().

If ignore_alignment is True, all of the above is ignored. The C structures produced will not have any explicit alignment modifiers. As a result, the the field offsets of dtype may differ from the ones chosen by the compiler.

Modules are cached, and the function returns a single module instance for equal dtype’s. Therefore inside a kernel it will be rendered with the same prefix everywhere it is used. This results in a behavior characteristic for a structural type system, same as for the basic dtype-ctype conversion.

Parameters
  • dtype

  • ignore_alignment

Warning

As of numpy 1.8, the isalignedstruct attribute is not enough to ensure a mapping between a dtype and a C struct with only the fields that are present in the dtype. Therefore, ctype_struct will make some additional checks and raise ValueError if it is not the case.

grunnur.dtypes.complex_ctr(dtype: numpy.dtype)str

Returns name of the constructor for the given dtype.

Parameters

dtype

grunnur.dtypes.c_constant(val, dtype: Optional[numpy.dtype] = None)str

Returns a C-style numerical constant. If val has a struct dtype, the generated constant will have the form { ... } and can be used as an initializer for a variable.

Parameters
  • val

  • dtype

grunnur.dtypes.align(dtype: numpy.dtype)numpy.dtype

Returns a new struct dtype with the field offsets changed to the ones a compiler would use (without being given any explicit alignment qualifiers). Ignores all existing explicit itemsizes and offsets.

Parameters

dtype

Struct helpers

grunnur.dtypes.c_path(path: List[Union[str, int]])str

Returns a string corresponding to the path to a struct element in C. The path is the sequence of field names/array indices returned from flatten_dtype().

Parameters

path

grunnur.dtypes.flatten_dtype(dtype: numpy.dtype)List[Tuple[List[Union[str, int]], numpy.dtype]]

Returns a list of tuples (path, dtype) for each of the basic dtypes in a (possibly nested) dtype. path is a list of field names/array indices leading to the corresponding element.

Parameters

dtype

grunnur.dtypes.extract_field(arr: numpy.ndarray, path: List[Union[str, int]])numpy.ndarray

Extracts an element from an array of struct dtype. The path is the sequence of field names/array indices returned from flatten_dtype().

Parameters
  • arr

  • path

Data type checks and conversions

grunnur.dtypes.normalize_type(dtype: Union[Type, numpy.dtype])numpy.dtype

Numpy’s dtype shortcuts (e.g. numpy.int32) are type objects and have slightly different properties from actual numpy.dtype objects. This function converts the former to numpy.dtype and keeps the latter unchanged.

Parameters

dtype

grunnur.dtypes.is_complex(dtype: numpy.dtype)bool

Returns True if dtype is complex.

Parameters

dtype

grunnur.dtypes.is_double(dtype: numpy.dtype)bool

Returns True if dtype is double precision floating point.

Parameters

dtype

grunnur.dtypes.is_integer(dtype: numpy.dtype)bool

Returns True if dtype is an integer.

Parameters

dtype

grunnur.dtypes.is_real(dtype: numpy.dtype)bool

Returns True if dtype is a real number (but not complex).

Parameters

dtype

grunnur.dtypes.result_type(*dtypes: numpy.dtype)numpy.dtype

Wrapper for numpy.result_type() which takes into account types supported by GPUs.

Parameters

dtypes

grunnur.dtypes.min_scalar_type(val, force_signed: bool = False)numpy.dtype

Wrapper for numpy.min_scalar_dtype() which takes into account types supported by GPUs.

If force_signed is True, a signed type will be returned even if val is positive.

grunnur.dtypes.detect_type(val)numpy.dtype

Returns the data type of val.

grunnur.dtypes.complex_for(dtype: numpy.dtype)numpy.dtype

Returns complex dtype corresponding to given floating point dtype.

Parameters

dtype

grunnur.dtypes.real_for(dtype: numpy.dtype)numpy.dtype

Returns floating point dtype corresponding to given complex dtype.

Parameters

dtype

grunnur.dtypes.cast(dtype: numpy.dtype)Callable[[Any], Any]

Returns function that takes one argument and casts it to dtype.

Parameters

dtype

Function modules

This module contains Module factories which are used to compensate for the lack of complex number operations in OpenCL, and the lack of C++ synthax which would allow one to write them.

grunnur.functions.add(*in_dtypes: numpy.dtype, out_dtype: Optional[numpy.dtype] = None)grunnur.Module

Returns a Module with a function of len(in_dtypes) arguments that adds values of types in_dtypes. If out_dtype is given, it will be set as a return type for this function.

This is necessary since on some platforms complex numbers are based on 2-vectors, and therefore the + operator for a complex and a real number works in an unexpected way (returning (a.x + b, a.y + b) instead of (a.x + b, a.y)).

Parameters
  • in_dtypes

  • out_dtype

grunnur.functions.cast(in_dtype: numpy.dtype, out_dtype: numpy.dtype)grunnur.Module

Returns a Module with a function of one argument that casts values of in_dtype to out_dtype.

Parameters
  • in_dtype

  • out_dtype

grunnur.functions.conj(dtype: numpy.dtype)grunnur.Module

Returns a Module with a function of one argument that conjugates the value of type dtype (if it is not a complex data type, the value will not be modified).

Parameters

dtype

grunnur.functions.div(dividend_dtype: numpy.dtype, divisor_dtype: numpy.dtype, out_dtype: Optional[numpy.dtype] = None)grunnur.Module

Returns a Module with a function of two arguments that divides a value of type dividend_dtype by a value of type divisor_dtype. If out_dtype is given, it will be set as a return type for this function.

Parameters
  • dividend_dtype

  • divisor_dtype

  • out_dtype

grunnur.functions.exp(dtype: numpy.dtype)grunnur.Module

Returns a Module with a function of one argument that exponentiates the value of type dtype (must be a real or a complex data type).

Parameters

dtype

grunnur.functions.mul(*in_dtypes: numpy.dtype, out_dtype: Optional[numpy.dtype] = None)grunnur.Module

Returns a Module with a function of len(in_dtypes) arguments that multiplies values of types in_dtypes. If out_dtype is given, it will be set as a return type for this function.

Parameters
  • in_dtypes

  • out_dtype

grunnur.functions.norm(dtype: numpy.dtype)grunnur.Module

Returns a Module with a function of one argument that returns the 2-norm of the value of type dtype (product by the complex conjugate if the value is complex, square otherwise).

Parameters

dtype

grunnur.functions.polar(dtype: numpy.dtype)grunnur.Module

Returns a Module with a function of two arguments that returns the complex-valued rho * exp(i * theta) for values rho, theta of type dtype (must be a real data type).

Parameters

dtype

grunnur.functions.polar_unit(dtype: numpy.dtype)grunnur.Module

Returns a Module with a function of one argument that returns a complex number exp(i * theta) == (cos(theta), sin(theta)) for a value theta of type dtype (must be a real data type).

Parameters

dtype

grunnur.functions.pow(base_dtype: numpy.dtype, exponent_dtype: Optional[numpy.dtype] = None, out_dtype: Optional[numpy.dtype] = None)grunnur.Module

Returns a Module with a function of two arguments that raises the first argument of type base_dtype to the power of the second argument of type exponent_dtype (an integer or real data type).

If exponent_dtype or out_dtype are not given, they default to base_dtype. If base_dtype is not the same as out_dtype, the input is cast to out_dtype before exponentiation. If exponent_dtype is real, but both base_dtype and out_dtype are integer, a ValueError is raised.

Parameters
  • base_dtype

  • exponent_dtype

  • out_dtype

Virtual buffers

Often one needs temporary buffers that are only used in one place in the code, but used many times. Allocating them each time they are used may involve too much overhead; allocating real buffers and storing them increases the program’s memory requirements. A possible middle ground is using virtual allocations, where several of them can use the samy physical allocation. The virtual allocation manager will make sure that two virtual buffers that are used simultaneously (as declared by the user) will not share the same physical space.

class grunnur.virtual_alloc.VirtualManager(context: grunnur.Context)

Base class for a manager of virtual allocations.

Parameters

context – an instance of Context.

allocator(dependencies=None)grunnur.virtual_alloc.VirtualAllocator

Create a callable to use for Array creation.

Parameters

dependencies – can be a Array instance (the ones containing persistent allocations will be ignored), an iterable with valid values, or an object with the attribute __virtual_allocations__ which is a valid value (the last two will be processed recursively).

pack(queue)

Packs the real allocations possibly reducing total memory usage. This process can be slow and may synchronize the base queue.

statistics()VirtualAllocationStatistics

Returns allocation statistics.

class grunnur.virtual_alloc.TrivialManager(*args, **kwds)

Trivial manager — allocates a separate buffer for each allocation request.

class grunnur.virtual_alloc.ZeroOffsetManager(*args, **kwds)

Tries to assign several allocation requests to a single real allocation, if dependencies allow that. All virtual allocations start from the beginning of real allocations.

class grunnur.virtual_alloc.VirtualAllocator(manager: VirtualManager, dependencies: Set[int])

A helper callable object to use as an allocator for Array creation. Encapsulates the dependencies (as identifiers, doesn’t hold references for actual objects).

class grunnur.virtual_alloc.VirtualAllocationStatistics

Virtual allocation details.

real_num: int

The number of physical allocations.

real_size_total: int

The total size of physical allocations (in bytes).

real_sizes: Dict[int, int]

A dictionary size: count with the counts for physical allocations of each size.

virtual_num: int

The number of virtual allocations.

virtual_size_total: int

The total size of virtual allocations (in bytes).

virtual_sizes: Dict[int, int]

A dictionary size: count with the counts for virtual allocations of each size.

Kernel toolbox

There is a set of macros attached to any kernel depending on the API it is being compiled for:

GRUNNUR_CUDA_API

If defined, specifies that the kernel is being compiled for CUDA API.

GRUNNUR_OPENCL_API

If defined, specifies that the kernel is being compiled for CUDA API.

GRUNNUR_FAST_MATH

If defined, specifies that the compilation for this kernel was requested with fast_math == True.

LOCAL_BARRIER

Synchronizes threads inside a block.

FUNCTION

Modifier for a device-only function declaration.

KERNEL

Modifier for a kernel function declaration.

GLOBAL_MEM

Modifier for a global memory pointer argument.

LOCAL_MEM_DECL

Modifier for a statically allocated local memory variable.

LOCAL_MEM_DYNAMIC

Modifier for a dynamically allocated local memory variable (CUDA only).

LOCAL_MEM

Modifier for a local memory argument in device-only functions.

CONSTANT_MEM_DECL

Modifier for a statically allocated constant memory variable.

CONSTANT_MEM

Modifier for a constant memory argument in device-only functions.

INLINE

Modifier for inline functions.

SIZE_T

The type of local/global IDs and sizes. Equal to unsigned int for CUDA, and size_t for OpenCL (which can be 32- or 64-bit unsigned integer, depending on the device).

SIZE_T get_local_id(unsigned int dim)
SIZE_T get_group_id(unsigned int dim)
SIZE_T get_global_id(unsigned int dim)
SIZE_T get_local_size(unsigned int dim)
SIZE_T get_num_groups(unsigned int dim)
SIZE_T get_global_size(unsigned int dim)

Local, group and global identifiers and sizes. In case of CUDA mimic the behavior of corresponding OpenCL functions.

VSIZE_T

The type of local/global IDs in the virtual grid. It is separate from SIZE_T because the former is intended to be equivalent to what the backend is using, while VSIZE_T is a separate type and can be made larger than SIZE_T in the future if necessary.

ALIGN(int)

Used to specify an explicit alignment (in bytes) for fields in structures, as

typedef struct {
    char ALIGN(4) a;
    int b;
} MY_STRUCT;