Skip to content

Other managedCuda classes

Michael Kunz edited this page Aug 1, 2015 · 1 revision

CudaPagelockedHostMemory: In order to use asynchron copy methods (host to device or device to host) the host array must be allocated as pinned or page-locked memory. To realize this, CudaPagelockedHostMemory[2D,3D] allocates the memory using cuda’s cuMemHostAlloc. To simplify access per element, the class provides an index property to get or set single values. When implementing large datasets you must know that each single per element access trespasses the managed/unmanaged memory barrier and must be marshaled. Access is therefore not really fast. To handle large amount of data, a copy of a managed array to the unmanaged memory in one block would be faster.

CudaPagelockedHostMemory_[Type]: As the previous approach using generics and marshalling was not satisfying in terms of speed and direct pointer arithmetic with generics is not possible in C#, I tried something new, what I would call "templates with C#" using T4: A T4 template creates all possible variants like 'float', 'int4', etc. which then access memory directly via pointers. The achieved performance of this approach is close to native arrays. In case you want to use CudaPagelockedHostMemory with your own datatypes, simply copy the tt-file to your project and modify the list of types to process (but be aware of the license: managedCUDA is LGPL!).

CudaManagedMemory_[Type]: Using the same approach as for page locked memory, CudaManagedMemory gives access to the full feature set of managed memory introduced with Cuda 6.5 in .net.

CudaRegisteredHostMemory: In C++, registered host memory is normally allocated memory but with registration it gets usable for asynchron copies. But in the .net world this doesn’t work as expected: Also CudaRegisteredHostMemory is part of ManagedCUDA it shouldn’t be used. Use CudaPagelockedHostMemory instead.

CudaArray[1D,2D,3D]: Represents a CUArray. Either you specify an already existing CUArray as storage location, e.g. from graphics interop, or a new CUArray is created internally. Only if the inner CUArray was allocated by the constructor, it will be freed while disposing.

GraphicsInterop: Several graphics interop resource classes exist, one for every graphics API (DirectX or OpenGL). All these resources must be registered and can be mapped to cuda variables, cuda textures or cuda arrays, depending on their type. For efficient mapping, all resources can be grouped in a CudaGraphicsInteropResourceCollection, so that one single Map() call is enough to finish the task. Have a look at the sample applications to see how to use the collection.

Clone this wiki locally