Class DriverAPINativeMethods.AsynchronousMemcpy_v2
Any host memory involved must be DMA'able (e.g., allocated with cuMemAllocHost). memcpy's done with these functions execute in parallel with the CPU and, if the hardware is available, may execute in parallel with the GPU. Asynchronous memcpy must be accompanied by appropriate stream synchronization.
Inheritance
Inherited Members
Namespace: ManagedCuda
Assembly: ManagedCuda.dll
Syntax
public static class AsynchronousMemcpy_v2
Methods
cuMemcpy2DAsync_v2(ref CUDAMemCpy2D, CUstream)
Perform a 2D memory copy according to the parameters specified in pCopy
. See CUDAMemCpy2D.
cuMemcpy2DAsync_v2(ref CUDAMemCpy2D, CUstream) returns an error if any pitch is greater than the maximum allowed (memPitch).
cuMemAllocPitch_v2(ref CUdeviceptr, ref SizeT, SizeT, SizeT, UInt32) passes back pitches that always work with cuMemcpy2DAsync_v2(ref CUDAMemCpy2D, CUstream). On intra-device
memory copies (device <-> device, CUDA array <-> device, CUDA array <-> CUDA array), cuMemcpy2DAsync_v2(ref CUDAMemCpy2D, CUstream) may fail
for pitches not computed by cuMemAllocPitch_v2(ref CUdeviceptr, ref SizeT, SizeT, SizeT, UInt32). cuMemcpy2DUnaligned_v2(ref CUDAMemCpy2D) (not async!) does not have this restriction, but
may run significantly slower in the cases where cuMemcpy2DAsync_v2(ref CUDAMemCpy2D, CUstream) would have returned an error code.
Declaration
public static CUResult cuMemcpy2DAsync_v2(ref CUDAMemCpy2D pCopy, CUstream hStream)
Parameters
Type | Name | Description |
---|---|---|
CUDAMemCpy2D | pCopy | Parameters for the memory copy |
CUstream | hStream | Stream identifier |
Returns
Type | Description |
---|---|
CUResult | CUDA Error Codes: Success, ErrorDeinitialized, ErrorNotInitialized,
ErrorInvalidContext, ErrorInvalidValue.
|
cuMemcpy3DAsync_v2(ref CUDAMemCpy3D, CUstream)
Perform a 3D memory copy according to the parameters specified in pCopy
. See CUDAMemCpy3D.
cuMemcpy3DAsync_v2(ref CUDAMemCpy3D, CUstream) returns an error if any pitch is greater than the maximum allowed (memPitch).
cuMemcpy3DAsync_v2(ref CUDAMemCpy3D, CUstream) is asynchronous and can optionally be associated to a stream by passing a non-zero hStream
argument. It only works on page-locked host memory and returns an error if a pointer to pageable memory is passed
as input.
The srcLOD and dstLOD members of the CUDAMemCpy3D structure must be set to 0.
Declaration
public static CUResult cuMemcpy3DAsync_v2(ref CUDAMemCpy3D pCopy, CUstream hStream)
Parameters
Type | Name | Description |
---|---|---|
CUDAMemCpy3D | pCopy | Parameters for the memory copy |
CUstream | hStream | Stream indetifier |
Returns
Type | Description |
---|---|
CUResult | CUDA Error Codes: Success, ErrorDeinitialized, ErrorNotInitialized,
ErrorInvalidContext, ErrorInvalidValue.
|
cuMemcpy3DPeerAsync(ref CUDAMemCpy3DPeer, CUstream)
Perform a 3D memory copy according to the parameters specified in
pCopy
. See the definition of the CUDAMemCpy3DPeer structure
for documentation of its parameters.
Declaration
public static CUResult cuMemcpy3DPeerAsync(ref CUDAMemCpy3DPeer pCopy, CUstream hStream)
Parameters
Type | Name | Description |
---|---|---|
CUDAMemCpy3DPeer | pCopy | Parameters for the memory copy |
CUstream | hStream | Stream identifier |
Returns
Type | Description |
---|---|
CUResult | CUDA Error Codes: Success, ErrorDeinitialized, ErrorNotInitialized,
ErrorInvalidContext, ErrorInvalidValue.
|
cuMemcpyAsync(CUdeviceptr, CUdeviceptr, SizeT, CUstream)
Copies data between two pointers.
dst
and src
are base pointers of the destination and source, respectively.
ByteCount
specifies the number of bytes to copy.
Note that this function infers the type of the transfer (host to host, host to
device, device to device, or device to host) from the pointer values. This
function is only allowed in contexts which support unified addressing.
Note that this function is asynchronous and can optionally be associated to
a stream by passing a non-zero hStream
argument
Declaration
public static CUResult cuMemcpyAsync(CUdeviceptr dst, CUdeviceptr src, SizeT ByteCount, CUstream hStream)
Parameters
Type | Name | Description |
---|---|---|
CUdeviceptr | dst | Destination unified virtual address space pointer |
CUdeviceptr | src | Source unified virtual address space pointer |
SizeT | ByteCount | Size of memory copy in bytes |
CUstream | hStream | Stream identifier |
Returns
Type | Description |
---|---|
CUResult | CUDA Error Codes: Success, ErrorDeinitialized, ErrorNotInitialized,
ErrorInvalidContext, ErrorInvalidValue.
|
cuMemcpyAtoHAsync_v2(IntPtr, CUarray, SizeT, SizeT, CUstream)
Copies from one 1D CUDA array to host memory. dstHost
specifies the base pointer of the destination. srcArray
and srcOffset
specify the CUDA array handle and starting offset in bytes of the source data. ByteCount
specifies
the number of bytes to copy.
cuMemcpyAtoHAsync_v2(IntPtr, CUarray, SizeT, SizeT, CUstream) is asynchronous and can optionally be associated to a stream by passing a non-zero stream hStream
argument. It only works on page-locked host memory and returns an error if a pointer to pageable memory is passed
as input.
Declaration
public static CUResult cuMemcpyAtoHAsync_v2(IntPtr dstHost, CUarray srcArray, SizeT srcOffset, SizeT ByteCount, CUstream hStream)
Parameters
Type | Name | Description |
---|---|---|
System.IntPtr | dstHost | Destination pointer |
CUarray | srcArray | Source array |
SizeT | srcOffset | Offset in bytes of source array |
SizeT | ByteCount | Size of memory copy in bytes |
CUstream | hStream | Stream identifier |
Returns
Type | Description |
---|---|
CUResult | CUDA Error Codes: Success, ErrorDeinitialized, ErrorNotInitialized,
ErrorInvalidContext, ErrorInvalidValue.
|
cuMemcpyDtoDAsync_v2(CUdeviceptr, CUdeviceptr, SizeT, CUstream)
Copies from device memory to device memory. dstDevice
and srcDevice
are the base pointers of the destination
and source, respectively. ByteCount
specifies the number of bytes to copy. Note that this function is asynchronous
and can optionally be associated to a stream by passing a non-zero hStream
argument.
Declaration
public static CUResult cuMemcpyDtoDAsync_v2(CUdeviceptr dstDevice, CUdeviceptr srcDevice, SizeT ByteCount, CUstream hStream)
Parameters
Type | Name | Description |
---|---|---|
CUdeviceptr | dstDevice | Destination device pointer |
CUdeviceptr | srcDevice | Source device pointer |
SizeT | ByteCount | Size of memory copy in bytes |
CUstream | hStream | Stream identifier |
Returns
Type | Description |
---|---|
CUResult | CUDA Error Codes: Success, ErrorDeinitialized, ErrorNotInitialized,
ErrorInvalidContext, ErrorInvalidValue.
|
cuMemcpyDtoHAsync_v2(IntPtr, CUdeviceptr, SizeT, CUstream)
Copies from device to host memory. dstHost
and srcDevice
specify the base pointers of the destination and
source, respectively. ByteCount
specifies the number of bytes to copy.
cuMemcpyDtoHAsync_v2(IntPtr, CUdeviceptr, SizeT, CUstream) is asynchronous and can optionally be associated to a stream by passing a non-zero
hStream
argument. It only works on page-locked memory and returns an error if a pointer to pageable memory
is passed as input.
Declaration
public static CUResult cuMemcpyDtoHAsync_v2(IntPtr dstHost, CUdeviceptr srcDevice, SizeT ByteCount, CUstream hStream)
Parameters
Type | Name | Description |
---|---|---|
System.IntPtr | dstHost | Destination host pointer |
CUdeviceptr | srcDevice | Source device pointer |
SizeT | ByteCount | Size of memory copy in bytes |
CUstream | hStream | Stream identifier |
Returns
Type | Description |
---|---|
CUResult | CUDA Error Codes: Success, ErrorDeinitialized, ErrorNotInitialized,
ErrorInvalidContext, ErrorInvalidValue.
|
cuMemcpyHtoAAsync_v2(CUarray, SizeT, IntPtr, SizeT, CUstream)
Copies from host memory to a 1D CUDA array. dstArray
and dstOffset
specify the CUDA array handle and
starting offset in bytes of the destination data. srcHost
specifies the base address of the source. ByteCount
specifies the number of bytes to copy.
cuMemcpyHtoAAsync_v2(CUarray, SizeT, IntPtr, SizeT, CUstream) is asynchronous and can optionally be associated to a stream by passing a non-zero
hStream
argument. It only works on page-locked memory and returns an error if a pointer to pageable memory
is passed as input.
Declaration
public static CUResult cuMemcpyHtoAAsync_v2(CUarray dstArray, SizeT dstOffset, IntPtr srcHost, SizeT ByteCount, CUstream hStream)
Parameters
Type | Name | Description |
---|---|---|
CUarray | dstArray | Destination array |
SizeT | dstOffset | Offset in bytes of destination array |
System.IntPtr | srcHost | Source host pointer |
SizeT | ByteCount | Size of memory copy in bytes |
CUstream | hStream | Stream identifier |
Returns
Type | Description |
---|---|
CUResult | CUDA Error Codes: Success, ErrorDeinitialized, ErrorNotInitialized,
ErrorInvalidContext, ErrorInvalidValue.
|
cuMemcpyHtoDAsync_v2(CUdeviceptr, IntPtr, SizeT, CUstream)
Copies from host memory to device memory. dstDevice
and srcHost
are the base addresses of the destination
and source, respectively. ByteCount
specifies the number of bytes to copy.
cuMemcpyHtoDAsync_v2(CUdeviceptr, IntPtr, SizeT, CUstream) is asynchronous and can optionally be associated to a stream by passing a non-zero hStream
argument. It only works on page-locked memory and returns an error if a pointer to pageable memory is passed as
input.
Declaration
public static CUResult cuMemcpyHtoDAsync_v2(CUdeviceptr dstDevice, IntPtr srcHost, SizeT ByteCount, CUstream hStream)
Parameters
Type | Name | Description |
---|---|---|
CUdeviceptr | dstDevice | Destination device pointer |
System.IntPtr | srcHost | Source host pointer |
SizeT | ByteCount | Size of memory copy in bytes |
CUstream | hStream | Stream identifier |
Returns
Type | Description |
---|---|
CUResult | CUDA Error Codes: Success, ErrorDeinitialized, ErrorNotInitialized,
ErrorInvalidContext, ErrorInvalidValue.
|
cuMemcpyPeerAsync(CUdeviceptr, CUcontext, CUdeviceptr, CUcontext, SizeT, CUstream)
Copies from device memory in one context to device memory in another
context. dstDevice
is the base device pointer of the destination memory
and dstContext
is the destination context. srcDevice
is the base
device pointer of the source memory and srcContext
is the source pointer.
ByteCount
specifies the number of bytes to copy. Note that this function
is asynchronous with respect to the host and all work in other streams in
other devices.
Declaration
public static CUResult cuMemcpyPeerAsync(CUdeviceptr dstDevice, CUcontext dstContext, CUdeviceptr srcDevice, CUcontext srcContext, SizeT ByteCount, CUstream hStream)
Parameters
Type | Name | Description |
---|---|---|
CUdeviceptr | dstDevice | Destination device pointer |
CUcontext | dstContext | Destination context |
CUdeviceptr | srcDevice | Source device pointer |
CUcontext | srcContext | Source context |
SizeT | ByteCount | Size of memory copy in bytes |
CUstream | hStream | Stream identifier |
Returns
Type | Description |
---|---|
CUResult | CUDA Error Codes: Success, ErrorDeinitialized, ErrorNotInitialized,
ErrorInvalidContext, ErrorInvalidValue.
|