Class DriverAPINativeMethods.Streams
Groups all stream API calls
Inheritance
Inherited Members
Namespace: ManagedCuda
Assembly: ManagedCuda.dll
Syntax
public static class Streams
Methods
cuStreamAddCallback(CUstream, CUstreamCallback, IntPtr, CUStreamAddCallbackFlags)
Adds a callback to be called on the host after all currently enqueued items in the stream have completed. For each cuStreamAddCallback call, the callback will be executed exactly once. The callback will block later work in the stream until it is finished.
The callback may be passed Success or an error code. In the event of a device error, all subsequently executed callbacks will receive an appropriate CUResult.
Callbacks must not make any CUDA API calls. Attempting to use a CUDA API will result in ErrorNotPermitted. Callbacks must not perform any synchronization that may depend on outstanding device work or other callbacks that are not mandated to run earlier. Callbacks without a mandated order (in independent streams) execute in undefined order and may be serialized.
This API requires compute capability 1.1 or greater. See cuDeviceGetAttribute or ::cuDeviceGetProperties to query compute capability. Attempting to use this API with earlier compute versions will return ErrorNotSupported.
Declaration
public static CUResult cuStreamAddCallback(CUstream hStream, CUstreamCallback callback, IntPtr userData, CUStreamAddCallbackFlags flags)
Parameters
Type | Name | Description |
---|---|---|
CUstream | hStream | Stream to add callback to |
CUstreamCallback | callback | The function to call once preceding stream operations are complete |
System.IntPtr | userData | User specified data to be passed to the callback function |
CUStreamAddCallbackFlags | flags | Reserved for future use; must be 0. |
Returns
Type | Description |
---|---|
CUResult | CUDA Error Codes: Success, ErrorDeinitialized, ErrorNotInitialized, ErrorInvalidContext, ErrorInvalidHandle. |
cuStreamAttachMemAsync(CUstream, CUdeviceptr, SizeT, CUmemAttach_flags)
Attach memory to a stream asynchronously
Enqueues an operation in hStream
to specify stream association of
length
bytes of memory starting from dptr
. This function is a
stream-ordered operation, meaning that it is dependent on, and will
only take effect when, previous work in stream has completed. Any
previous association is automatically replaced.
dptr
must point to an address within managed memory space declared
using the __managed__ keyword or allocated with cuMemAllocManaged.
length
must be zero, to indicate that the entire allocation's
stream association is being changed. Currently, it's not possible
to change stream association for a portion of an allocation.
The stream association is specified using flags
which must be
one of CUmemAttach_flags.
If the Global flag is specified, the memory can be accessed
by any stream on any device.
If the Host flag is specified, the program makes a guarantee
that it won't access the memory on the device from any stream.
If the Single flag is specified, the program makes a guarantee
that it will only access the memory on the device from hStream
. It is illegal
to attach singly to the NULL stream, because the NULL stream is a virtual global
stream and not a specific stream. An error will be returned in this case.
When memory is associated with a single stream, the Unified Memory system will
allow CPU access to this memory region so long as all operations in hStream
have completed, regardless of whether other streams are active. In effect,
this constrains exclusive ownership of the managed memory region by
an active GPU to per-stream activity instead of whole-GPU activity.
Accessing memory on the device from streams that are not associated with it will produce undefined results. No error checking is performed by the Unified Memory system to ensure that kernels launched into other streams do not access this region.
It is a program's responsibility to order calls to cuStreamAttachMemAsync(CUstream, CUdeviceptr, SizeT, CUmemAttach_flags) via events, synchronization or other means to ensure legal access to memory at all times. Data visibility and coherency will be changed appropriately for all kernels which follow a stream-association change.
If hStream
is destroyed while data is associated with it, the association is
removed and the association reverts to the default visibility of the allocation
as specified at cuMemAllocManaged. For __managed__ variables, the default
association is always Global. Note that destroying a stream is an
asynchronous operation, and as a result, the change to default association won't
happen until all work in the stream has completed.
Declaration
public static CUResult cuStreamAttachMemAsync(CUstream hStream, CUdeviceptr dptr, SizeT length, CUmemAttach_flags flags)
Parameters
Type | Name | Description |
---|---|---|
CUstream | hStream | Stream in which to enqueue the attach operation |
CUdeviceptr | dptr | Pointer to memory (must be a pointer to managed memory) |
SizeT | length | Length of memory (must be zero) |
CUmemAttach_flags | flags | Must be one of CUmemAttach_flags |
Returns
Type | Description |
---|---|
CUResult |
cuStreamCreate(ref CUstream, CUStreamFlags)
Creates a stream and returns a handle in phStream
. The Flags
argument
determines behaviors of the stream. Valid values for Flags
are:
- Default: Default stream creation flag.
- NonBlocking: Specifies that work running in the created stream may run concurrently with work in stream 0 (the NULL stream), and that the created stream should perform no implicit synchronization with stream 0.
Declaration
public static CUResult cuStreamCreate(ref CUstream phStream, CUStreamFlags Flags)
Parameters
Type | Name | Description |
---|---|---|
CUstream | phStream | Returned newly created stream |
CUStreamFlags | Flags | Parameters for stream creation |
Returns
Type | Description |
---|---|
CUResult | CUDA Error Codes: Success, ErrorDeinitialized, ErrorNotInitialized,
ErrorInvalidContext, ErrorInvalidValue, ErrorOutOfMemory.
|
cuStreamCreateWithPriority(ref CUstream, CUStreamFlags, Int32)
Create a stream with the given priority
Creates a stream with the specified priority and returns a handle in phStream
.
This API alters the scheduler priority of work in the stream. Work in a higher priority stream may preempt work already executing in a low priority stream.
priority
follows a convention where lower numbers represent higher priorities.
'0' represents default priority. The range of meaningful numerical priorities can be queried using cuCtxGetStreamPriorityRange(ref Int32, ref Int32). If the specified priority is outside the numerical range returned by cuCtxGetStreamPriorityRange(ref Int32, ref Int32), it will automatically be clamped to the lowest or the highest number in the range.
Declaration
public static CUResult cuStreamCreateWithPriority(ref CUstream phStream, CUStreamFlags flags, int priority)
Parameters
Type | Name | Description |
---|---|---|
CUstream | phStream | Returned newly created stream |
CUStreamFlags | flags | Flags for stream creation. See ::cuStreamCreate for a list of valid flags |
System.Int32 | priority | Stream priority. Lower numbers represent higher priorities. See cuCtxGetStreamPriorityRange(ref Int32, ref Int32) for more information about meaningful stream priorities that can be passed. |
Returns
Type | Description |
---|---|
CUResult |
Remarks
Stream priorities are supported only on Quadro and Tesla GPUs with compute capability 3.5 or higher.
In the current implementation, only compute kernels launched in priority streams are affected by the stream's priority.
Stream priorities have no effect on host-to-device and device-to-host memory operations.
cuStreamDestroy(CUstream)
Destroys the stream specified by hStream.
Declaration
[Obsolete("Don't use this CUDA API call with CUDA version >= 4.0.")]
public static CUResult cuStreamDestroy(CUstream hStream)
Parameters
Type | Name | Description |
---|---|---|
CUstream | hStream | Stream to destroy |
Returns
Type | Description |
---|---|
CUResult | CUDA Error Codes: Success, ErrorDeinitialized, ErrorNotInitialized,
ErrorInvalidContext, ErrorInvalidValue.
|
cuStreamDestroy_v2(CUstream)
Destroys the stream specified by hStream.
In the case that the device is still doing work in the stream hStream
when cuStreamDestroy(CUstream) is called, the function will return immediately
and the resources associated with hStream
will be released automatically
once the device has completed all work in hStream
.
Declaration
public static CUResult cuStreamDestroy_v2(CUstream hStream)
Parameters
Type | Name | Description |
---|---|---|
CUstream | hStream | Stream to destroy |
Returns
Type | Description |
---|---|
CUResult | CUDA Error Codes: Success, ErrorDeinitialized, ErrorNotInitialized,
ErrorInvalidContext, ErrorInvalidValue.
|
cuStreamGetFlags(CUstream, ref CUStreamFlags)
Query the flags of a given stream
Query the flags of a stream created using cuStreamCreate(ref CUstream, CUStreamFlags) or cuStreamCreateWithPriority(ref CUstream, CUStreamFlags, Int32)
and return the flags in flags
.
Declaration
public static CUResult cuStreamGetFlags(CUstream hStream, ref CUStreamFlags flags)
Parameters
Type | Name | Description |
---|---|---|
CUstream | hStream | Handle to the stream to be queried |
CUStreamFlags | flags | Pointer to an unsigned integer in which the stream's flags are returned.
The value returned in |
Returns
Type | Description |
---|---|
CUResult |
cuStreamGetPriority(CUstream, ref Int32)
Query the priority of a given stream
Query the priority of a stream created using cuStreamCreate(ref CUstream, CUStreamFlags) or cuStreamCreateWithPriority(ref CUstream, CUStreamFlags, Int32)
and return the priority in priority
. Note that if the stream was created with a
priority outside the numerical range returned by cuCtxGetStreamPriorityRange(ref Int32, ref Int32),
this function returns the clamped priority.
See cuStreamCreateWithPriority(ref CUstream, CUStreamFlags, Int32) for details about priority clamping.
Declaration
public static CUResult cuStreamGetPriority(CUstream hStream, ref int priority)
Parameters
Type | Name | Description |
---|---|---|
CUstream | hStream | Handle to the stream to be queried |
System.Int32 | priority | Pointer to a signed integer in which the stream's priority is returned |
Returns
Type | Description |
---|---|
CUResult |
cuStreamQuery(CUstream)
Returns Success if all operations in the stream specified by hStream
have completed, or
ErrorNotReady if not.
Declaration
public static CUResult cuStreamQuery(CUstream hStream)
Parameters
Type | Name | Description |
---|---|---|
CUstream | hStream | Stream to query status of |
Returns
Type | Description |
---|---|
CUResult | CUDA Error Codes: Success, ErrorDeinitialized, ErrorNotInitialized,
ErrorInvalidContext, ErrorInvalidHandle, ErrorNotReady.
|
cuStreamSynchronize(CUstream)
Waits until the device has completed all operations in the stream specified by hStream
. If the context was created
with the BlockingSync flag, the CPU thread will block until the stream is finished with all of its
tasks.
Declaration
public static CUResult cuStreamSynchronize(CUstream hStream)
Parameters
Type | Name | Description |
---|---|---|
CUstream | hStream | Stream to wait for |
Returns
Type | Description |
---|---|
CUResult | CUDA Error Codes: Success, ErrorDeinitialized, ErrorNotInitialized,
ErrorInvalidContext, ErrorInvalidHandle.
|
cuStreamWaitEvent(CUstream, CUevent, UInt32)
Make a compute stream wait on an event
Makes all future work submitted to hStream
wait until hEvent
reports completion before beginning execution. This synchronization
will be performed efficiently on the device.
The stream hStream
will wait only for the completion of the most recent
host call to cuEventRecord(CUevent, CUstream) on hEvent
. Once this call has returned,
any functions (including cuEventRecord(CUevent, CUstream) and cuEventDestroy(CUevent) may be
called on hEvent
again, and the subsequent calls will not have any
effect on hStream
.
If hStream
is 0 (the NULL stream) any future work submitted in any stream
will wait for hEvent
to complete before beginning execution. This
effectively creates a barrier for all future work submitted to the context.
If cuEventRecord(CUevent, CUstream) has not been called on hEvent
, this call acts as if
the record has already completed, and so is a functional no-op.
Flags
argument must be 0.
Declaration
public static CUResult cuStreamWaitEvent(CUstream hStream, CUevent hEvent, uint Flags)
Parameters
Type | Name | Description |
---|---|---|
CUstream | hStream | Stream to destroy |
CUevent | hEvent | Event |
System.UInt32 | Flags | Flags argument must be set 0. |
Returns
Type | Description |
---|---|
CUResult | CUDA Error Codes: Success, ErrorDeinitialized, ErrorNotInitialized,
ErrorInvalidContext, ErrorInvalidValue.
|