Public API¶
API discovery¶
In many applications it would be enough to use dynamic module attributes to get an API
object:
from grunnur import cuda_api
from grunnur import opencl_api
from grunnur import any_api
For a finer programmatic control one can use the methods of the API
class:
-
class
grunnur.
API
¶ A generalized GPGPU API.
-
classmethod
all_available
() → List[API]¶ Returns a list of
API
objects for which backends are available.
-
classmethod
all_by_shortcut
(shortcut: Optional[str] = None) → List[API]¶ If
shortcut
is a string, returns a list of oneAPI
object whoseid
attribute has itsshortcut
attribute equal to it (or raises an error if it was not found, or its backend is not available).If
shortcut
isNone
, returns a list of all availableAPI
objects.- Parameters
shortcut – an API shortcut to match.
-
classmethod
from_api_id
(api_id: grunnur.adapter_base.APIID) → API¶ Creates an
API
object out of an identifier.- Parameters
api_id – API identifier.
-
id
: grunnur.adapter_base.APIID¶ This API’s ID.
-
shortcut
: str¶ A shortcut for this API (to use in
all_by_shortcut()
, usually coming from some kind of a CLI). Equal toid.shortcut
.
-
classmethod
-
grunnur.
cuda_api_id
() → grunnur.adapter_base.APIID¶ Returns the identifier of CUDA API.
-
grunnur.
opencl_api_id
() → grunnur.adapter_base.APIID¶ Returns the identifier of OpenCL API.
-
grunnur.
all_api_ids
() → List[grunnur.adapter_base.APIID]¶ Returns a list of identifiers for all APIs available.
Platforms¶
A platform is an OpenCL term, but we use it for CUDA API as well for the sake of uniformity. Naturally, there will always be a single (dummy) platform in CUDA.
-
class
grunnur.
Platform
¶ A generalized GPGPU platform.
-
classmethod
all
(api: API) → List[Platform]¶ Returns a list of platforms available for the given API.
- Parameters
api – the API to search in.
-
classmethod
all_by_masks
(api: API, include_masks: Optional[Sequence[str]] = None, exclude_masks: Optional[Sequence[str]] = None) → List[Platform]¶ Returns a list of all platforms with names satisfying the given criteria.
- Parameters
api – the API to search in.
include_masks – a list of strings (treated as regexes), one of which must match with the platform name.
exclude_masks – a list of strings (treated as regexes), neither of which must match with the platform name.
-
classmethod
from_backend_platform
(obj) → Platform¶ Wraps a backend platform object into a Grunnur platform object.
-
classmethod
from_index
(api: API, platform_idx: int) → Platform¶ Creates a platform based on its index in the list returned by the API.
- Parameters
api – the API to search in.
platform_idx – the target platform’s index.
-
name
: str¶ The platform’s name.
-
vendor
: str¶ The platform’s vendor.
-
version
: str¶ The platform’s version.
-
classmethod
Devices¶
-
class
grunnur.
Device
¶ A generalized GPGPU device.
-
classmethod
all
(platform: Platform) → List[Device]¶ Returns a list of devices available for the given platform.
- Parameters
platform – the platform to search in.
-
classmethod
all_by_masks
(platform: Platform, include_masks: Optional[Sequence[str]] = None, exclude_masks: Optional[Sequence[str]] = None, unique_only: bool = False, include_pure_parallel_devices: bool = False) → List[Device]¶ Returns a list of all devices satisfying the given criteria.
- Parameters
platform – the platform to search in.
include_masks – a list of strings (treated as regexes), one of which must match with the device name.
exclude_masks – a list of strings (treated as regexes), neither of which must match with the device name.
unique_only – if
True
, only return devices with unique names.include_pure_parallel_devices – if
True
, include devices withparams.max_total_local_size
equal to 1.
-
classmethod
from_backend_device
(obj) → Device¶ Wraps a backend device object into a Grunnur device object.
-
classmethod
from_index
(platform: Platform, device_idx: int) → Device¶ Creates a device based on its index in the list returned by the API.
- Parameters
platform – the API to search in.
device_idx – the target device’s index.
-
name
: str¶ This device’s name.
-
property
params
¶ Returns a
DeviceParameters
object associated with this device.
-
classmethod
-
class
grunnur.adapter_base.
DeviceParameters
¶ An object containing device’s specifications.
-
abstract property
compute_units
¶ The number of multiprocessors (CUDA)/compute units (OpenCL) for the device.
-
abstract property
local_mem_banks
¶ The number of independent channels for shared (CUDA)/local (OpenCL) memory, which can be used from one warp without request serialization.
-
abstract property
local_mem_size
¶ The size of shared (CUDA)/local (OpenCL) memory (in bytes).
-
abstract property
max_local_sizes
¶ The maximum number of threads in one block (CUDA), or work items in one work group (OpenCL) for each of the available dimensions.
-
abstract property
max_num_groups
¶ The maximum number of blocks (CUDA)/work groups (OpenCL) for each of the available dimensions.
-
abstract property
max_total_local_size
¶ The maximum total number of threads in one block (CUDA), or work items in one work group (OpenCL).
-
abstract property
type
¶ Device type.
-
abstract property
warp_size
¶ The number of threads (CUDA)/work items (OpenCL) that are executed synchronously (within one multiprocessor/compute unit).
-
abstract property
Device discovery¶
-
grunnur.
platforms_and_devices_by_mask
(api, quantity: Optional[int] = 1, platform_include_masks: Optional[Sequence[str]] = None, platform_exclude_masks: Optional[Sequence[str]] = None, device_include_masks: Optional[Sequence[str]] = None, device_exclude_masks: Optional[Sequence[str]] = None, unique_devices_only: bool = False, include_pure_parallel_devices: bool = False) → List[Tuple[Platform, List[Device]]]¶ Returns all tuples (platform, list of devices) where the platform name and device names satisfy the given criteria, and there are at least
quantity
devices in the list.- Parameters
quantity – the number of devices to find. If
None
, find all matching devices belonging to a single platform.platform_include_masks – passed to
Platform.all_by_masks()
.platform_exclude_masks – passed to
Platform.all_by_masks()
.device_include_masks – passed to
Device.all_by_masks()
.device_exclude_masks – passed to
Device.all_by_masks()
.unique_devices_only – passed to
Device.all_by_masks()
.include_pure_parallel_devices – passed to
Device.all_by_masks()
.
-
grunnur.
select_devices
(api, interactive: bool = False, quantity: Optional[int] = 1, **device_filters) → List[Device]¶ Using the results from
platforms_and_devices_by_mask()
, either lets the user select the devices (from the ones matching the criteria) interactively, or takes the first matching list ofquantity
devices.- Parameters
interactive – if
True
, shows a dialog to select the devices. IfFalse
, selects the first matching ones.quantity – passed to
platforms_and_devices_by_mask()
.device_filters – passed to
platforms_and_devices_by_mask()
.
Contexts¶
-
class
grunnur.
Context
¶ GPGPU context.
-
deactivate
()¶ CUDA API only: deactivates this context, popping all the CUDA context objects from the stack.
-
classmethod
from_backend_contexts
(backend_contexts, take_ownership: bool = False) → Context¶ Creates a context from a single or several backend device contexts. If
take_ownership
isTrue
, this object will be responsible for the lifetime of backend context objects (important for CUDA backend).
-
classmethod
from_backend_devices
(backend_devices) → Context¶ Creates a context from a single or several backend device objects.
-
classmethod
from_criteria
(api: API, interactive: bool = False, devices_num: Optional[int] = 1, **device_filters) → Context¶ Finds devices matching the given criteria and creates a
Context
object out of them.- Parameters
interactive – passed to
select_devices()
.devices_num – passed to
select_devices()
asquantity
.device_filters – passed to
select_devices()
.
-
Queues¶
-
class
grunnur.
Queue
(context: Context, device_idx: Optional[int] = None)¶ A queue on a single device.
- Parameters
context – a context on which to create a queue.
device_idx – device index in the context on which to create a queue. If there is more than one device in the context, it must be specified.
-
synchronize
()¶ Blocks until sub-queues on all devices are empty.
-
device_idx
: int¶ Device index this queue operates on.
-
class
grunnur.
MultiQueue
(context: Context, queues: Optional[Iterable[Queue]] = None)¶ A queue on multiple devices.
- Parameters
context – a context on which to create a queue.
queues – single-device queues (must belong to distinct devices).
-
classmethod
on_device_idxs
(context: Context, device_idxs: Iterable[int]) → MultiQueue¶ Creates a queue from provided device indexes (in the context).
-
synchronize
()¶ Blocks until queues on all devices are empty.
-
device_idxs
: Set[int]¶ Device indices (in the context) this queue operates on.
Buffers and arrays¶
-
class
grunnur.
Buffer
¶ A memory buffer on device.
-
classmethod
allocate
(context: Context, size: int, device_idx: Optional[int] = None) → Buffer¶ Allocate a buffer of
size
bytes.- Parameters
context – the context to use.
size – the buffer’s size in bytes.
device_idx – the device to allocate on (can be omitted in a single-device context).
-
get
(queue: Queue, host_array: numpy.ndarray, async_: bool = False)¶ Copy the contents of the buffer to the host array.
- Parameters
queue – the queue to use for the transfer.
host_array – the destination array.
async – if True, the transfer is performed asynchronously.
-
get_sub_region
(origin: int, size: int) → Buffer¶ Return a buffer object describing a subregion of this buffer.
- Parameters
origin – the offset of the subregion.
size – the size of the subregion.
-
set
(queue: Queue, buf: Union[numpy.ndarray, Buffer], no_async: bool = False)¶ Copy the contents of the host array or another buffer to this buffer.
- Parameters
queue – the queue to use for the transfer.
buf – the source -
numpy
array or aBuffer
object.no_async – if True, the transfer blocks until completion.
-
device_idx
: int¶ The index of the device this buffer is allocated on.
-
property
offset
¶ Offset of this buffer (in bytes) from the beginning of the physical allocation it resides in.
-
property
size
¶ This buffer’s size (in bytes).
-
classmethod
-
class
grunnur.
Array
¶ Array on a single device.
-
classmethod
empty
(context: Context, shape: Sequence[int], dtype: numpy.dtype, allocator: Optional[Callable[[int, int], Buffer]] = None, device_idx: Optional[int] = None) → Array¶ Creates an empty array.
- Parameters
shape – array shape.
dtype – array data type.
allocator – an optional callable taking two integer arguments (buffer size in bytes, and the device to allocate it on) and returning a
Buffer
object.device_idx – the index of the device on which to allocate the array.
-
classmethod
from_host
(queue: Queue, host_arr: numpy.ndarray) → Array¶ Creates an array object from a host array.
- Parameters
queue – the queue to use for the transfer.
host_arr – the source array.
-
get
(queue: Queue, dest: Optional[numpy.ndarray] = None, async_: bool = False) → numpy.ndarray¶ Copies the contents of the array to the host array and returns it.
- Parameters
queue – the queue to use for the transfer.
dest – the destination array. If
None
, the target array is created.async – if True, the transfer is performed asynchronously.
-
set
(queue: Queue, array: Union[numpy.ndarray, Array], no_async: bool = False)¶ Copies the contents of the host array to the array.
- Parameters
queue – the queue to use for the transfer.
array – the source array.
no_async – if True, the transfer blocks until completion.
-
dtype
: numpy.dtype¶ Array item data type.
-
shape
: Tuple[int, …]¶ Array shape.
-
strides
: Tuple[int, …]¶ Array strides.
-
classmethod
-
class
grunnur.array.
BaseSplay
¶ Base class for splay strategies for
MultiArray
.-
abstract
__call__
(arr: ArrayLike, devices: Dict[int, grunnur.Device]) → Dict[int, ArrayLike]¶ Creates a dictionary of views of an array-like object for each of the given devices.
- Parameters
arr – an array-like object.
devices – a dictionary of device indices matched to device objects.
-
ArrayLike
¶ The type of an array-like object (the one having a
shape
and supporting views via__getitem__()
)alias of TypeVar(‘ArrayLike’)
-
abstract
-
class
grunnur.
MultiArray
¶ An array on multiple devices.
-
class
CloneSplay
¶ Copies the given array to each device.
-
class
EqualSplay
¶ Splays the given array equally between the devices using the outermost dimension. The outermost dimension should be larger or equal to the number of devices.
-
classmethod
empty
(context: Context, shape: Sequence[int], dtype: numpy.dtype, allocator: Optional[Callable[[int, int], Buffer]] = None, device_idxs: Optional[Iterable[int]] = None, splay: Optional[grunnur.array.BaseSplay] = None) → MultiArray¶ Creates an empty array.
- Parameters
shape – array shape.
dtype – array data type.
allocator – an optional callable taking two integer arguments (buffer size in bytes, and the device to allocate it on) and returning a
Buffer
object.device_idx – the index of the device on which to allocate the array.
splay – the splay strategy (if
None
, anEqualSplay
object is used).
-
classmethod
from_host
(mqueue: MultiQueue, host_arr: numpy.ndarray, splay: Optional[grunnur.array.BaseSplay] = None) → MultiArray¶ Creates an array object from a host array.
- Parameters
mqueue – the queue to use for the transfer.
host_arr – the source array.
splay – the splay strategy (if
None
, anEqualSplay
object is used).
-
get
(mqueue: MultiQueue, dest: Optional[numpy.ndarray] = None, async_: bool = False) → numpy.ndarray¶ Copies the contents of the array to the host array and returns it.
- Parameters
mqueue – the queue to use for the transfer.
dest – the destination array. If
None
, the target array is created.async – if True, the transfer is performed asynchronously.
-
set
(mqueue: MultiQueue, array: Union[numpy.ndarray, MultiArray], no_async: bool = False)¶ Copies the contents of the host array to the array.
- Parameters
mqueue – the queue to use for the transfer.
array – the source array.
no_async – if True, the transfer blocks until completion.
-
dtype
: numpy.dtype¶ Array item data type.
-
shape
: Tuple[int, …]¶ Array shape.
-
shapes
: Dict[int, Tuple[int, …]]¶ Sub-array shapes matched to device indices.
-
class
Programs and kernels¶
-
class
grunnur.
Program
(context: Context, template_src: Union[str, Callable[[…], str], DefTemplate, Snippet], device_idxs: Optional[Sequence[int]] = None, no_prelude: bool = False, fast_math: bool = False, render_args: Union[List, Tuple] = [], render_globals: Dict = {}, compiler_options: Iterable[str] = [], keep: bool = False, constant_arrays: Mapping[str, Tuple[int, numpy.dtype]] = {})¶ A compiled program on device(s).
- Parameters
context – context to compile the program on.
template_src – a string with the source code, or a Mako template source to render.
device_idxs – a list of device numbers to compile on. If
None
, compile on all context’s devices.no_prelude – do not add prelude to the rendered source.
fast_math – compile using fast (but less accurate) math functions.
render_args – a list of positional args to pass to the template.
render_globals – a dictionary of globals to pass to the template.
compiler_options – a list of options to pass to the backend compiler.
keep – keep the intermediate files in a temporary directory.
constant_arrays – (CUDA only) a dictionary
name: (size, dtype)
of global constant arrays to be declared in the program.
-
set_constant_array
(queue: Queue, name: str, arr: Union[Array, numpy.ndarray])¶ Uploads a constant array to the context’s devices (CUDA only).
- Parameters
queue – the queue to use for the transfer.
name – the name of the constant array symbol in the code.
arr – either a device or a host array.
-
kernel
: grunnur.program.KernelHub¶ An object whose attributes are
Kernel
objects with the corresponding names.
-
sources
: Dict[int, str]¶ Source files used for each device.
-
class
grunnur.program.
KernelHub
¶ An object providing access to the host program’s kernels.
-
class
grunnur.program.
Kernel
¶ A kernel compiled for multiple devices.
-
__call__
(queue: Union[grunnur.Queue, grunnur.MultiQueue], global_size: Union[int, Sequence[int], Dict[int, Union[int, Sequence[int]]]], local_size: Union[int, Sequence[int], None, Dict[int, Optional[Union[int, Sequence[int]]]]] = None, *args, **kwds)¶ A shortcut for
Kernel.prepare()
and subsequentPreparedKernel.__call__()
. See their doc entries for details.
-
prepare
(global_size: Union[int, Sequence[int], Dict[int, Union[int, Sequence[int]]]], local_size: Union[int, Sequence[int], None, Dict[int, Optional[Union[int, Sequence[int]]]]] = None) → PreparedKernel¶ Prepares the kernel for execution.
If
local_size
orglobal_size
are integer, they will be treated as 1-tuples.One can pass specific global and local sizes for each device using dictionaries keyed with device indices. This achieves another purpose: the kernel will only be prepared for those devices, and not for all devices available in the context.
- Parameters
global_size – the total number of threads (CUDA)/work items (OpenCL) in each dimension (column-major). Note that there may be a maximum size in each dimension as well as the maximum number of dimensions. See
DeviceParameters
for details.local_size – the number of threads in a block (CUDA)/work items in a work group (OpenCL) in each dimension (column-major). If
None
, it will be chosen automatically.
-
property
max_total_local_sizes
¶ The maximum possible number of threads in a block (CUDA)/work items in a work group (OpenCL) for this kernel.
-
-
class
grunnur.program.
PreparedKernel
¶ A kernel specialized for execution on a set of devices with all possible preparations and checks performed.
-
__call__
(queue: Union[grunnur.Queue, grunnur.MultiQueue], *args, **kwds)¶ Enqueues the kernel on the devices in the given queue. The kernel must have been prepared for all of these devices.
If an argument is a
Array
orBuffer
object, it must belong to the device on which the kernel is being executed (soqueue
must only have one device).If an argument is a
MultiArray
, it should have subarrays on all the devices from the givenqueue
.If an argument is a
numpy
scalar, it will be passed to the kernel directly.If an argument is a integer-keyed
dict
, its values corresponding to the device indices the kernel is executed on will be passed as kernel arguments.- Parameters
args – kernel arguments.
kwds – backend-specific keyword parameters.
- Returns
a list of
Event
objects for enqueued kernels in case of PyOpenCL.
-
Static kernels¶
-
class
grunnur.
StaticKernel
(context: Context, template_src: Union[str, Callable[[…], str], DefTemplate, Snippet], name: str, global_size: Union[int, Sequence[int], Dict[int, Union[int, Sequence[int]]]], local_size: Union[int, Sequence[int], None, Dict[int, Optional[Union[int, Sequence[int]]]]] = None, render_globals: Dict = {}, constant_arrays: Mapping[str, Tuple[int, numpy.dtype]] = {}, **kwds)¶ An object containing a GPU kernel with fixed call sizes.
The globals for the source template will contain an object with the name
static
of the typeVsizeModules
containing the id/size functions to be used instead of regular ones.- Parameters
context – context to compile the kernel on.
template_src – a string with the source code, or a Mako template source to render.
name – the kernel’s name.
global_size – see
prepare()
.local_size – see
prepare()
.render_globals – a dictionary of globals to pass to the template.
constant_arrays – (CUDA only) a dictionary
name: (size, dtype)
of global constant arrays to be declared in the program.
-
__call__
(queue, *args)¶ Execute the kernel. In case of the OpenCL backend, returns a
pyopencl.Event
object.- Parameters
queue – the multi-device queue to use.
args – kernel arguments. See
grunnur.program.PreparedKernel.__call__()
.
-
set_constant_array
(queue: Queue, name: str, arr: Union[Array, numpy.ndarray])¶ Uploads a constant array to the context’s devices (CUDA only).
- Parameters
queue – the queue to use for the transfer.
name – the name of the constant array symbol in the code.
arr – either a device or a host array.
-
sources
: Dict[int, str]¶ Source files used for each device.
-
class
grunnur.vsize.
VsizeModules
(local_id, local_size, group_id, num_groups, global_id, global_size, global_flat_id, global_flat_size, begin)¶ A collection of modules passed to
grunnur.StaticKernel
. Should be used instead of regular group/thread id functions.-
begin
: grunnur.Module¶ Provides the statement
${begin}
that should be used at the start of a static kernel function.
-
global_flat_id
: grunnur.Module¶ Provides the function
VSIZE_T ${global_flat_id}()
returning the global id of the current thread with all dimensions flattened.
-
global_flat_size
: grunnur.Module¶ Provides the function
VSIZE_T ${global_flat_size}()
. returning the global size of with all dimensions flattened.
-
global_id
: grunnur.Module¶ Provides the function
VSIZE_T ${global_id}(int dim)
returning the global id of the current thread.
-
global_size
: grunnur.Module¶ Provides the function
VSIZE_T ${global_size}(int dim)
returning the global size along dimensiondim
.
-
group_id
: grunnur.Module¶ Provides the function
VSIZE_T ${group_id}(int dim)
returning the group id of the current thread.
-
local_id
: grunnur.Module¶ Provides the function
VSIZE_T ${local_id}(int dim)
returning the local id of the current thread.
-
local_size
: grunnur.Module¶ Provides the function
VSIZE_T ${local_size}(int dim)
returning the size of the current group.
-
num_groups
: grunnur.Module¶ Provides the function
VSIZE_T ${num_groups}(int dim)
returning the number of groups in dimensiondim
.
-
Utilities¶
-
class
grunnur.
Template
(mako_template: mako.template.Template)¶ A wrapper for mako
Template
objects.-
classmethod
from_associated_file
(filename: str) → Template¶ Returns a
Template
object created from the file which has the same name asfilename
and the extension.mako
. Typically used in computation modules asTemplate.from_associated_file(__file__)
.
-
get_def
(name: str) → DefTemplate¶ Returns the template def with the name
name
.
-
classmethod
-
class
grunnur.
DefTemplate
(name: str, mako_def_template: mako.template.DefTemplate, source: str)¶ A wrapper for Mako
DefTemplate
objects.-
classmethod
from_callable
(name: str, callable_obj: Callable[[…], str]) → DefTemplate¶ Creates a template def from a callable returning a string. The parameter list of the callable is used to create the pararameter list of the resulting template def; the callable should return the body of a Mako template def regardless of the arguments it receives.
-
classmethod
from_string
(name: str, argnames: Iterable[str], source: str) → DefTemplate¶ Creates a template def from a string with its body and a list of argument names.
-
render
(*args, **globals_) → str¶ Renders the template def with given arguments and globals.
-
classmethod
-
class
grunnur.
RenderError
(exception: Exception, args: tuple, globals_: dict, source: str)¶ A custom wrapper for Mako template render errors, to facilitate debugging.
-
exception
: Exception¶ The original exception thrown by Mako’s render().
-
globals
: dict¶ The globals used to render the template.
-
source
: str¶ The source of the template.
-
-
class
grunnur.
Snippet
(template: DefTemplate, render_globals: Mapping = {})¶ Contains a source snippet - a template function that will be rendered in place, with possible context that can include other
Snippet
orModule
objects.Creates a snippet out of a prepared template.
- Parameters
template –
render_globals –
-
classmethod
from_callable
(callable_obj: Callable[[…], str], name: str = '_snippet', render_globals: Mapping = {}) → Snippet¶ Creates a snippet from a callable returning a string. The parameter list of the callable is used to create the pararameter list of the resulting template def; the callable should return the body of a Mako template def regardless of the arguments it receives.
- Parameters
callable_obj – a callable returning the template source.
name – the snippet’s name (will simplify debugging)
render_globals – a dictionary of “globals” to be used when rendering the template.
-
classmethod
from_string
(source: str, name: str = '_snippet', render_globals: Mapping = {}) → Snippet¶ Creates a snippet from a template source, treated as a body of a template def with no arguments.
- Parameters
source – a string with the template source.
name – the snippet’s name (will simplify debugging)
render_globals – a dictionary of “globals” to be used when rendering the template.
-
class
grunnur.
Module
(template: DefTemplate, render_globals: Mapping = {})¶ Contains a source module - a template function that will be rendered at root level, and the place where it was called will receive its unique identifier (prefix), which is used to prefix all module’s functions, types and macros in the global namespace.
Creates a module out of a prepared template.
- Parameters
template –
render_globals –
-
classmethod
from_callable
(callable_obj: Callable[[…], str], name: str = '_module', render_globals: Mapping = {}) → Module¶ Creates a module from a callable returning a string. The parameter list of the callable is used to create the pararameter list of the resulting template def; the callable should return the body of a Mako template def regardless of the arguments it receives.
The prefix will be passed as the first argument to the template def on render.
- Parameters
callable_obj – a callable returning the template source.
name – the module’s name (will simplify debugging)
render_globals – a dictionary of “globals” to be used when rendering the template.
-
classmethod
from_string
(source: str, name: str = '_module', render_globals: Mapping = {}) → Module¶ Creates a module from a template source, treated as a body of a template def with a single argument (prefix).
- Parameters
source – a string with the template source.
name – the module’s name (will simplify debugging)
render_globals – a dictionary of “globals” to be used when rendering the template.
Data type utilities¶
C interop¶
-
grunnur.dtypes.
ctype
(dtype: numpy.dtype) → Union[str, grunnur.Module]¶ Returns an object that can be passed as a global to
Program()
and used to render a C equivalent of the givennumpy
dtype. If there is a built-in C equivalent, the object is just a string with the type name; otherwise it is aModule
object containing the correspondingstruct
declaration.Note
If
dtype
is a struct type, it needs to be aligned (seectype_struct()
andalign()
).- Parameters
dtype –
-
grunnur.dtypes.
ctype_struct
(dtype: Union[Type, numpy.dtype], ignore_alignment: bool = False) → grunnur.Module¶ For a struct type, returns a
Module
object with thetypedef
of a struct corresponding to the givendtype
(with its name set to the module prefix).The structure definition includes the alignment required to produce field offsets specified in
dtype
; therefore,dtype
must be either a simple type, or have proper offsets and dtypes (the ones that can be reporoduced in C using explicit alignment attributes, but without additional padding) and the attributeisalignedstruct == True
. An aligned dtype can be produced either by standard means (aligned
flag innumpy.dtype
constructor and explicit offsets and itemsizes), or created out of an arbitrary dtype with the help ofalign()
.If
ignore_alignment
is True, all of the above is ignored. The C structures produced will not have any explicit alignment modifiers. As a result, the the field offsets ofdtype
may differ from the ones chosen by the compiler.Modules are cached, and the function returns a single module instance for equal
dtype
’s. Therefore inside a kernel it will be rendered with the same prefix everywhere it is used. This results in a behavior characteristic for a structural type system, same as for the basic dtype-ctype conversion.- Parameters
dtype –
ignore_alignment –
Warning
As of
numpy
1.8, theisalignedstruct
attribute is not enough to ensure a mapping between a dtype and a C struct with only the fields that are present in the dtype. Therefore,ctype_struct
will make some additional checks and raiseValueError
if it is not the case.
-
grunnur.dtypes.
complex_ctr
(dtype: numpy.dtype) → str¶ Returns name of the constructor for the given
dtype
.- Parameters
dtype –
-
grunnur.dtypes.
c_constant
(val, dtype: Optional[numpy.dtype] = None) → str¶ Returns a C-style numerical constant. If
val
has a struct dtype, the generated constant will have the form{ ... }
and can be used as an initializer for a variable.- Parameters
val –
dtype –
-
grunnur.dtypes.
align
(dtype: numpy.dtype) → numpy.dtype¶ Returns a new struct dtype with the field offsets changed to the ones a compiler would use (without being given any explicit alignment qualifiers). Ignores all existing explicit itemsizes and offsets.
- Parameters
dtype –
Struct helpers¶
-
grunnur.dtypes.
c_path
(path: List[Union[str, int]]) → str¶ Returns a string corresponding to the
path
to a struct element in C. Thepath
is the sequence of field names/array indices returned fromflatten_dtype()
.- Parameters
path –
-
grunnur.dtypes.
flatten_dtype
(dtype: numpy.dtype) → List[Tuple[List[Union[str, int]], numpy.dtype]]¶ Returns a list of tuples
(path, dtype)
for each of the basic dtypes in a (possibly nested)dtype
.path
is a list of field names/array indices leading to the corresponding element.- Parameters
dtype –
-
grunnur.dtypes.
extract_field
(arr: numpy.ndarray, path: List[Union[str, int]]) → numpy.ndarray¶ Extracts an element from an array of struct dtype. The
path
is the sequence of field names/array indices returned fromflatten_dtype()
.- Parameters
arr –
path –
Data type checks and conversions¶
-
grunnur.dtypes.
normalize_type
(dtype: Union[Type, numpy.dtype]) → numpy.dtype¶ Numpy’s dtype shortcuts (e.g.
numpy.int32
) aretype
objects and have slightly different properties from actualnumpy.dtype
objects. This function converts the former tonumpy.dtype
and keeps the latter unchanged.- Parameters
dtype –
-
grunnur.dtypes.
is_complex
(dtype: numpy.dtype) → bool¶ Returns
True
ifdtype
is complex.- Parameters
dtype –
-
grunnur.dtypes.
is_double
(dtype: numpy.dtype) → bool¶ Returns
True
ifdtype
is double precision floating point.- Parameters
dtype –
-
grunnur.dtypes.
is_integer
(dtype: numpy.dtype) → bool¶ Returns
True
ifdtype
is an integer.- Parameters
dtype –
-
grunnur.dtypes.
is_real
(dtype: numpy.dtype) → bool¶ Returns
True
ifdtype
is a real number (but not complex).- Parameters
dtype –
-
grunnur.dtypes.
result_type
(*dtypes: numpy.dtype) → numpy.dtype¶ Wrapper for
numpy.result_type()
which takes into account types supported by GPUs.- Parameters
dtypes –
-
grunnur.dtypes.
min_scalar_type
(val, force_signed: bool = False) → numpy.dtype¶ Wrapper for
numpy.min_scalar_dtype()
which takes into account types supported by GPUs.If
force_signed
isTrue
, a signed type will be returned even ifval
is positive.
-
grunnur.dtypes.
detect_type
(val) → numpy.dtype¶ Returns the data type of
val
.
-
grunnur.dtypes.
complex_for
(dtype: numpy.dtype) → numpy.dtype¶ Returns complex dtype corresponding to given floating point
dtype
.- Parameters
dtype –
-
grunnur.dtypes.
real_for
(dtype: numpy.dtype) → numpy.dtype¶ Returns floating point dtype corresponding to given complex
dtype
.- Parameters
dtype –
-
grunnur.dtypes.
cast
(dtype: numpy.dtype) → Callable[[Any], Any]¶ Returns function that takes one argument and casts it to
dtype
.- Parameters
dtype –
Function modules¶
This module contains Module
factories
which are used to compensate for the lack of complex number operations in OpenCL,
and the lack of C++ synthax which would allow one to write them.
-
grunnur.functions.
add
(*in_dtypes: numpy.dtype, out_dtype: Optional[numpy.dtype] = None) → grunnur.Module¶ Returns a
Module
with a function oflen(in_dtypes)
arguments that adds values of typesin_dtypes
. Ifout_dtype
is given, it will be set as a return type for this function.This is necessary since on some platforms complex numbers are based on 2-vectors, and therefore the
+
operator for a complex and a real number works in an unexpected way (returning(a.x + b, a.y + b)
instead of(a.x + b, a.y)
).- Parameters
in_dtypes –
out_dtype –
-
grunnur.functions.
cast
(in_dtype: numpy.dtype, out_dtype: numpy.dtype) → grunnur.Module¶ Returns a
Module
with a function of one argument that casts values ofin_dtype
toout_dtype
.- Parameters
in_dtype –
out_dtype –
-
grunnur.functions.
conj
(dtype: numpy.dtype) → grunnur.Module¶ Returns a
Module
with a function of one argument that conjugates the value of typedtype
(if it is not a complex data type, the value will not be modified).- Parameters
dtype –
-
grunnur.functions.
div
(dividend_dtype: numpy.dtype, divisor_dtype: numpy.dtype, out_dtype: Optional[numpy.dtype] = None) → grunnur.Module¶ Returns a
Module
with a function of two arguments that divides a value of typedividend_dtype
by a value of typedivisor_dtype
. Ifout_dtype
is given, it will be set as a return type for this function.- Parameters
dividend_dtype –
divisor_dtype –
out_dtype –
-
grunnur.functions.
exp
(dtype: numpy.dtype) → grunnur.Module¶ Returns a
Module
with a function of one argument that exponentiates the value of typedtype
(must be a real or a complex data type).- Parameters
dtype –
-
grunnur.functions.
mul
(*in_dtypes: numpy.dtype, out_dtype: Optional[numpy.dtype] = None) → grunnur.Module¶ Returns a
Module
with a function oflen(in_dtypes)
arguments that multiplies values of typesin_dtypes
. Ifout_dtype
is given, it will be set as a return type for this function.- Parameters
in_dtypes –
out_dtype –
-
grunnur.functions.
norm
(dtype: numpy.dtype) → grunnur.Module¶ Returns a
Module
with a function of one argument that returns the 2-norm of the value of typedtype
(product by the complex conjugate if the value is complex, square otherwise).- Parameters
dtype –
-
grunnur.functions.
polar
(dtype: numpy.dtype) → grunnur.Module¶ Returns a
Module
with a function of two arguments that returns the complex-valuedrho * exp(i * theta)
for valuesrho, theta
of typedtype
(must be a real data type).- Parameters
dtype –
-
grunnur.functions.
polar_unit
(dtype: numpy.dtype) → grunnur.Module¶ Returns a
Module
with a function of one argument that returns a complex numberexp(i * theta) == (cos(theta), sin(theta))
for a valuetheta
of typedtype
(must be a real data type).- Parameters
dtype –
-
grunnur.functions.
pow
(base_dtype: numpy.dtype, exponent_dtype: Optional[numpy.dtype] = None, out_dtype: Optional[numpy.dtype] = None) → grunnur.Module¶ Returns a
Module
with a function of two arguments that raises the first argument of typebase_dtype
to the power of the second argument of typeexponent_dtype
(an integer or real data type).If
exponent_dtype
orout_dtype
are not given, they default tobase_dtype
. Ifbase_dtype
is not the same asout_dtype
, the input is cast toout_dtype
before exponentiation. Ifexponent_dtype
is real, but bothbase_dtype
andout_dtype
are integer, aValueError
is raised.- Parameters
base_dtype –
exponent_dtype –
out_dtype –
Virtual buffers¶
Often one needs temporary buffers that are only used in one place in the code, but used many times. Allocating them each time they are used may involve too much overhead; allocating real buffers and storing them increases the program’s memory requirements. A possible middle ground is using virtual allocations, where several of them can use the samy physical allocation. The virtual allocation manager will make sure that two virtual buffers that are used simultaneously (as declared by the user) will not share the same physical space.
-
class
grunnur.virtual_alloc.
VirtualManager
(context: grunnur.Context)¶ Base class for a manager of virtual allocations.
- Parameters
context – an instance of
Context
.
-
allocator
(dependencies=None) → grunnur.virtual_alloc.VirtualAllocator¶ Create a callable to use for
Array
creation.- Parameters
dependencies – can be a
Array
instance (the ones containing persistent allocations will be ignored), an iterable with valid values, or an object with the attribute__virtual_allocations__
which is a valid value (the last two will be processed recursively).
-
pack
(queue)¶ Packs the real allocations possibly reducing total memory usage. This process can be slow and may synchronize the base queue.
-
statistics
() → VirtualAllocationStatistics¶ Returns allocation statistics.
-
class
grunnur.virtual_alloc.
TrivialManager
(*args, **kwds)¶ Trivial manager — allocates a separate buffer for each allocation request.
-
class
grunnur.virtual_alloc.
ZeroOffsetManager
(*args, **kwds)¶ Tries to assign several allocation requests to a single real allocation, if dependencies allow that. All virtual allocations start from the beginning of real allocations.
-
class
grunnur.virtual_alloc.
VirtualAllocator
(manager: VirtualManager, dependencies: Set[int])¶ A helper callable object to use as an allocator for
Array
creation. Encapsulates the dependencies (as identifiers, doesn’t hold references for actual objects).
-
class
grunnur.virtual_alloc.
VirtualAllocationStatistics
¶ Virtual allocation details.
-
real_num
: int¶ The number of physical allocations.
-
real_size_total
: int¶ The total size of physical allocations (in bytes).
-
real_sizes
: Dict[int, int]¶ A dictionary
size: count
with the counts for physical allocations of each size.
-
virtual_num
: int¶ The number of virtual allocations.
-
virtual_size_total
: int¶ The total size of virtual allocations (in bytes).
-
virtual_sizes
: Dict[int, int]¶ A dictionary
size: count
with the counts for virtual allocations of each size.
-
Kernel toolbox¶
There is a set of macros attached to any kernel depending on the API it is being compiled for:
-
GRUNNUR_CUDA_API
¶ If defined, specifies that the kernel is being compiled for CUDA API.
-
GRUNNUR_OPENCL_API
¶ If defined, specifies that the kernel is being compiled for CUDA API.
-
GRUNNUR_FAST_MATH
¶ If defined, specifies that the compilation for this kernel was requested with
fast_math == True
.
-
LOCAL_BARRIER
¶ Synchronizes threads inside a block.
-
FUNCTION
¶ Modifier for a device-only function declaration.
-
KERNEL
¶ Modifier for a kernel function declaration.
-
GLOBAL_MEM
¶ Modifier for a global memory pointer argument.
-
LOCAL_MEM_DECL
¶ Modifier for a statically allocated local memory variable.
-
LOCAL_MEM_DYNAMIC
¶ Modifier for a dynamically allocated local memory variable (CUDA only).
-
LOCAL_MEM
¶ Modifier for a local memory argument in device-only functions.
-
CONSTANT_MEM_DECL
¶ Modifier for a statically allocated constant memory variable.
-
CONSTANT_MEM
¶ Modifier for a constant memory argument in device-only functions.
-
INLINE
¶ Modifier for inline functions.
-
SIZE_T
¶ The type of local/global IDs and sizes. Equal to
unsigned int
for CUDA, andsize_t
for OpenCL (which can be 32- or 64-bit unsigned integer, depending on the device).
-
SIZE_T
get_global_size
(unsigned int dim)¶ Local, group and global identifiers and sizes. In case of CUDA mimic the behavior of corresponding OpenCL functions.
-
VSIZE_T
¶ The type of local/global IDs in the virtual grid. It is separate from
SIZE_T
because the former is intended to be equivalent to what the backend is using, whileVSIZE_T
is a separate type and can be made larger thanSIZE_T
in the future if necessary.
-
ALIGN
(int)¶ Used to specify an explicit alignment (in bytes) for fields in structures, as
typedef struct { char ALIGN(4) a; int b; } MY_STRUCT;