Class DriverAPINativeMethods.AsynchronousMemcpy_v2
Any host memory involved must be DMA'able (e.g., allocated with cuMemAllocHost). memcpy's done with these functions execute in parallel with the CPU and, if the hardware is available, may execute in parallel with the GPU. Asynchronous memcpy must be accompanied by appropriate stream synchronization.
Inheritance
Inherited Members
Namespace: ManagedCuda
Assembly: ManagedCuda.dll
Syntax
public static class AsynchronousMemcpy_v2
Methods
cuMemcpy2DAsync_v2(ref CUDAMemCpy2D, CUstream)
Perform a 2D memory copy according to the parameters specified in pCopy. See CUDAMemCpy2D.
cuMemcpy2DAsync_v2(ref CUDAMemCpy2D, CUstream) returns an error if any pitch is greater than the maximum allowed (memPitch).
cuMemAllocPitch_v2(ref CUdeviceptr, ref SizeT, SizeT, SizeT, UInt32) passes back pitches that always work with cuMemcpy2DAsync_v2(ref CUDAMemCpy2D, CUstream). On intra-device
memory copies (device <-> device, CUDA array <-> device, CUDA array <-> CUDA array), cuMemcpy2DAsync_v2(ref CUDAMemCpy2D, CUstream) may fail
for pitches not computed by cuMemAllocPitch_v2(ref CUdeviceptr, ref SizeT, SizeT, SizeT, UInt32). cuMemcpy2DUnaligned_v2(ref CUDAMemCpy2D) (not async!) does not have this restriction, but
may run significantly slower in the cases where cuMemcpy2DAsync_v2(ref CUDAMemCpy2D, CUstream) would have returned an error code.
Declaration
public static CUResult cuMemcpy2DAsync_v2(ref CUDAMemCpy2D pCopy, CUstream hStream)
Parameters
| Type | Name | Description |
|---|---|---|
| CUDAMemCpy2D | pCopy | Parameters for the memory copy |
| CUstream | hStream | Stream identifier |
Returns
| Type | Description |
|---|---|
| CUResult | CUDA Error Codes: Success, ErrorDeinitialized, ErrorNotInitialized,
ErrorInvalidContext, ErrorInvalidValue.
|
cuMemcpy3DAsync_v2(ref CUDAMemCpy3D, CUstream)
Perform a 3D memory copy according to the parameters specified in pCopy. See CUDAMemCpy3D.
cuMemcpy3DAsync_v2(ref CUDAMemCpy3D, CUstream) returns an error if any pitch is greater than the maximum allowed (memPitch).
cuMemcpy3DAsync_v2(ref CUDAMemCpy3D, CUstream) is asynchronous and can optionally be associated to a stream by passing a non-zero hStream
argument. It only works on page-locked host memory and returns an error if a pointer to pageable memory is passed
as input.
The srcLOD and dstLOD members of the CUDAMemCpy3D structure must be set to 0.
Declaration
public static CUResult cuMemcpy3DAsync_v2(ref CUDAMemCpy3D pCopy, CUstream hStream)
Parameters
| Type | Name | Description |
|---|---|---|
| CUDAMemCpy3D | pCopy | Parameters for the memory copy |
| CUstream | hStream | Stream indetifier |
Returns
| Type | Description |
|---|---|
| CUResult | CUDA Error Codes: Success, ErrorDeinitialized, ErrorNotInitialized,
ErrorInvalidContext, ErrorInvalidValue.
|
cuMemcpy3DPeerAsync(ref CUDAMemCpy3DPeer, CUstream)
Perform a 3D memory copy according to the parameters specified in
pCopy. See the definition of the CUDAMemCpy3DPeer structure
for documentation of its parameters.
Declaration
public static CUResult cuMemcpy3DPeerAsync(ref CUDAMemCpy3DPeer pCopy, CUstream hStream)
Parameters
| Type | Name | Description |
|---|---|---|
| CUDAMemCpy3DPeer | pCopy | Parameters for the memory copy |
| CUstream | hStream | Stream identifier |
Returns
| Type | Description |
|---|---|
| CUResult | CUDA Error Codes: Success, ErrorDeinitialized, ErrorNotInitialized,
ErrorInvalidContext, ErrorInvalidValue.
|
cuMemcpyAsync(CUdeviceptr, CUdeviceptr, SizeT, CUstream)
Copies data between two pointers.
dst and src are base pointers of the destination and source, respectively.
ByteCount specifies the number of bytes to copy.
Note that this function infers the type of the transfer (host to host, host to
device, device to device, or device to host) from the pointer values. This
function is only allowed in contexts which support unified addressing.
Note that this function is asynchronous and can optionally be associated to
a stream by passing a non-zero hStream argument
Declaration
public static CUResult cuMemcpyAsync(CUdeviceptr dst, CUdeviceptr src, SizeT ByteCount, CUstream hStream)
Parameters
| Type | Name | Description |
|---|---|---|
| CUdeviceptr | dst | Destination unified virtual address space pointer |
| CUdeviceptr | src | Source unified virtual address space pointer |
| SizeT | ByteCount | Size of memory copy in bytes |
| CUstream | hStream | Stream identifier |
Returns
| Type | Description |
|---|---|
| CUResult | CUDA Error Codes: Success, ErrorDeinitialized, ErrorNotInitialized,
ErrorInvalidContext, ErrorInvalidValue.
|
cuMemcpyAtoHAsync_v2(IntPtr, CUarray, SizeT, SizeT, CUstream)
Copies from one 1D CUDA array to host memory. dstHost specifies the base pointer of the destination. srcArray
and srcOffset specify the CUDA array handle and starting offset in bytes of the source data. ByteCount specifies
the number of bytes to copy.
cuMemcpyAtoHAsync_v2(IntPtr, CUarray, SizeT, SizeT, CUstream) is asynchronous and can optionally be associated to a stream by passing a non-zero stream hStream
argument. It only works on page-locked host memory and returns an error if a pointer to pageable memory is passed
as input.
Declaration
public static CUResult cuMemcpyAtoHAsync_v2(IntPtr dstHost, CUarray srcArray, SizeT srcOffset, SizeT ByteCount, CUstream hStream)
Parameters
| Type | Name | Description |
|---|---|---|
| System.IntPtr | dstHost | Destination pointer |
| CUarray | srcArray | Source array |
| SizeT | srcOffset | Offset in bytes of source array |
| SizeT | ByteCount | Size of memory copy in bytes |
| CUstream | hStream | Stream identifier |
Returns
| Type | Description |
|---|---|
| CUResult | CUDA Error Codes: Success, ErrorDeinitialized, ErrorNotInitialized,
ErrorInvalidContext, ErrorInvalidValue.
|
cuMemcpyDtoDAsync_v2(CUdeviceptr, CUdeviceptr, SizeT, CUstream)
Copies from device memory to device memory. dstDevice and srcDevice are the base pointers of the destination
and source, respectively. ByteCount specifies the number of bytes to copy. Note that this function is asynchronous
and can optionally be associated to a stream by passing a non-zero hStream argument.
Declaration
public static CUResult cuMemcpyDtoDAsync_v2(CUdeviceptr dstDevice, CUdeviceptr srcDevice, SizeT ByteCount, CUstream hStream)
Parameters
| Type | Name | Description |
|---|---|---|
| CUdeviceptr | dstDevice | Destination device pointer |
| CUdeviceptr | srcDevice | Source device pointer |
| SizeT | ByteCount | Size of memory copy in bytes |
| CUstream | hStream | Stream identifier |
Returns
| Type | Description |
|---|---|
| CUResult | CUDA Error Codes: Success, ErrorDeinitialized, ErrorNotInitialized,
ErrorInvalidContext, ErrorInvalidValue.
|
cuMemcpyDtoHAsync_v2(IntPtr, CUdeviceptr, SizeT, CUstream)
Copies from device to host memory. dstHost and srcDevice specify the base pointers of the destination and
source, respectively. ByteCount specifies the number of bytes to copy.
cuMemcpyDtoHAsync_v2(IntPtr, CUdeviceptr, SizeT, CUstream) is asynchronous and can optionally be associated to a stream by passing a non-zero
hStream argument. It only works on page-locked memory and returns an error if a pointer to pageable memory
is passed as input.
Declaration
public static CUResult cuMemcpyDtoHAsync_v2(IntPtr dstHost, CUdeviceptr srcDevice, SizeT ByteCount, CUstream hStream)
Parameters
| Type | Name | Description |
|---|---|---|
| System.IntPtr | dstHost | Destination host pointer |
| CUdeviceptr | srcDevice | Source device pointer |
| SizeT | ByteCount | Size of memory copy in bytes |
| CUstream | hStream | Stream identifier |
Returns
| Type | Description |
|---|---|
| CUResult | CUDA Error Codes: Success, ErrorDeinitialized, ErrorNotInitialized,
ErrorInvalidContext, ErrorInvalidValue.
|
cuMemcpyHtoAAsync_v2(CUarray, SizeT, IntPtr, SizeT, CUstream)
Copies from host memory to a 1D CUDA array. dstArray and dstOffset specify the CUDA array handle and
starting offset in bytes of the destination data. srcHost specifies the base address of the source. ByteCount
specifies the number of bytes to copy.
cuMemcpyHtoAAsync_v2(CUarray, SizeT, IntPtr, SizeT, CUstream) is asynchronous and can optionally be associated to a stream by passing a non-zero
hStream argument. It only works on page-locked memory and returns an error if a pointer to pageable memory
is passed as input.
Declaration
public static CUResult cuMemcpyHtoAAsync_v2(CUarray dstArray, SizeT dstOffset, IntPtr srcHost, SizeT ByteCount, CUstream hStream)
Parameters
| Type | Name | Description |
|---|---|---|
| CUarray | dstArray | Destination array |
| SizeT | dstOffset | Offset in bytes of destination array |
| System.IntPtr | srcHost | Source host pointer |
| SizeT | ByteCount | Size of memory copy in bytes |
| CUstream | hStream | Stream identifier |
Returns
| Type | Description |
|---|---|
| CUResult | CUDA Error Codes: Success, ErrorDeinitialized, ErrorNotInitialized,
ErrorInvalidContext, ErrorInvalidValue.
|
cuMemcpyHtoDAsync_v2(CUdeviceptr, IntPtr, SizeT, CUstream)
Copies from host memory to device memory. dstDevice and srcHost are the base addresses of the destination
and source, respectively. ByteCount specifies the number of bytes to copy.
cuMemcpyHtoDAsync_v2(CUdeviceptr, IntPtr, SizeT, CUstream) is asynchronous and can optionally be associated to a stream by passing a non-zero hStream
argument. It only works on page-locked memory and returns an error if a pointer to pageable memory is passed as
input.
Declaration
public static CUResult cuMemcpyHtoDAsync_v2(CUdeviceptr dstDevice, IntPtr srcHost, SizeT ByteCount, CUstream hStream)
Parameters
| Type | Name | Description |
|---|---|---|
| CUdeviceptr | dstDevice | Destination device pointer |
| System.IntPtr | srcHost | Source host pointer |
| SizeT | ByteCount | Size of memory copy in bytes |
| CUstream | hStream | Stream identifier |
Returns
| Type | Description |
|---|---|
| CUResult | CUDA Error Codes: Success, ErrorDeinitialized, ErrorNotInitialized,
ErrorInvalidContext, ErrorInvalidValue.
|
cuMemcpyPeerAsync(CUdeviceptr, CUcontext, CUdeviceptr, CUcontext, SizeT, CUstream)
Copies from device memory in one context to device memory in another
context. dstDevice is the base device pointer of the destination memory
and dstContext is the destination context. srcDevice is the base
device pointer of the source memory and srcContext is the source pointer.
ByteCount specifies the number of bytes to copy. Note that this function
is asynchronous with respect to the host and all work in other streams in
other devices.
Declaration
public static CUResult cuMemcpyPeerAsync(CUdeviceptr dstDevice, CUcontext dstContext, CUdeviceptr srcDevice, CUcontext srcContext, SizeT ByteCount, CUstream hStream)
Parameters
| Type | Name | Description |
|---|---|---|
| CUdeviceptr | dstDevice | Destination device pointer |
| CUcontext | dstContext | Destination context |
| CUdeviceptr | srcDevice | Source device pointer |
| CUcontext | srcContext | Source context |
| SizeT | ByteCount | Size of memory copy in bytes |
| CUstream | hStream | Stream identifier |
Returns
| Type | Description |
|---|---|
| CUResult | CUDA Error Codes: Success, ErrorDeinitialized, ErrorNotInitialized,
ErrorInvalidContext, ErrorInvalidValue.
|