Class CudaStream
Wrapps a CUstream handle. In case of a so called NULL stream, use the native CUstream struct instead.
Inheritance
Implements
Inherited Members
Namespace: ManagedCuda
Assembly: ManagedCuda.dll
Syntax
public class CudaStream : IDisposable
Constructors
| Improve this Doc View SourceCudaStream()
Creates a new Stream using None
Declaration
public CudaStream()
CudaStream(CUstream)
Creates a new wrapper for an existing stream
Declaration
public CudaStream(CUstream _stream)
Parameters
Type | Name | Description |
---|---|---|
CUstream | _stream |
CudaStream(CUStreamFlags)
Creates a new Stream
Declaration
public CudaStream(CUStreamFlags flags)
Parameters
Type | Name | Description |
---|---|---|
CUStreamFlags | flags | Parameters for stream creation (must be None) |
CudaStream(Int32)
Creates a new Stream using None and with the given priority
This API alters the scheduler priority of work in the stream. Work in a higher priority stream may preempt work already executing in a low priority stream.
priority
follows a convention where lower numbers represent higher priorities.
'0' represents default priority.
Declaration
public CudaStream(int priority)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | priority | Stream priority. Lower numbers represent higher priorities. |
CudaStream(Int32, CUStreamFlags)
Creates a new Stream using None and with the given priority
This API alters the scheduler priority of work in the stream. Work in a higher priority stream may preempt work already executing in a low priority stream.
priority
follows a convention where lower numbers represent higher priorities.
'0' represents default priority.
Declaration
public CudaStream(int priority, CUStreamFlags flags)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | priority | Stream priority. Lower numbers represent higher priorities. |
CUStreamFlags | flags | Parameters for stream creation (must be None) |
Properties
| Improve this Doc View SourceStream
returns the wrapped CUstream handle
Declaration
public CUstream Stream { get; set; }
Property Value
Type | Description |
---|---|
CUstream |
Methods
| Improve this Doc View SourceAddCallback(CUstreamCallback, IntPtr, CUStreamAddCallbackFlags)
Adds a callback to be called on the host after all currently enqueued items in the stream have completed. For each cuStreamAddCallback call, the callback will be executed exactly once. The callback will block later work in the stream until it is finished.
The callback may be passed Success or an error code. In the event of a device error, all subsequently executed callbacks will receive an appropriate CUResult.
Callbacks must not make any CUDA API calls. Attempting to use a CUDA API will result in ErrorNotPermitted. Callbacks must not perform any synchronization that may depend on outstanding device work or other callbacks that are not mandated to run earlier. Callbacks without a mandated order (in independent streams) execute in undefined order and may be serialized.
This API requires compute capability 1.1 or greater. See cuDeviceGetAttribute or ::cuDeviceGetProperties to query compute capability. Attempting to use this API with earlier compute versions will return ErrorNotSupported.
Declaration
public void AddCallback(CUstreamCallback callback, IntPtr userData, CUStreamAddCallbackFlags flags)
Parameters
Type | Name | Description |
---|---|---|
CUstreamCallback | callback | The function to call once preceding stream operations are complete |
System.IntPtr | userData | User specified data to be passed to the callback function. Use GCAlloc to pin a managed object |
CUStreamAddCallbackFlags | flags | Callback flags (must be CUStreamAddCallbackFlags.None) |
AddCallbackToNullStream(CUstreamCallback, IntPtr, CUStreamAddCallbackFlags)
Here the Stream is the NULL stream
Adds a callback to be called on the host after all currently enqueued items in the stream have completed. For each cuStreamAddCallback call, the callback will be executed exactly once. The callback will block later work in the stream until it is finished.
The callback may be passed Success or an error code. In the event of a device error, all subsequently executed callbacks will receive an appropriate CUResult.
Callbacks must not make any CUDA API calls. Attempting to use a CUDA API will result in ErrorNotPermitted. Callbacks must not perform any synchronization that may depend on outstanding device work or other callbacks that are not mandated to run earlier. Callbacks without a mandated order (in independent streams) execute in undefined order and may be serialized.
This API requires compute capability 1.1 or greater. See cuDeviceGetAttribute or ::cuDeviceGetProperties to query compute capability. Attempting to use this API with earlier compute versions will return ErrorNotSupported.
Declaration
public static void AddCallbackToNullStream(CUstreamCallback callback, IntPtr userData, CUStreamAddCallbackFlags flags)
Parameters
Type | Name | Description |
---|---|---|
CUstreamCallback | callback | The function to call once preceding stream operations are complete |
System.IntPtr | userData | User specified data to be passed to the callback function. Use GCAlloc to pin a managed object |
CUStreamAddCallbackFlags | flags | Callback flags (must be CUStreamAddCallbackFlags.None) |
cuStreamGetFlags()
Query the flags of this stream.
Declaration
public CUStreamFlags cuStreamGetFlags()
Returns
Type | Description |
---|---|
CUStreamFlags | the stream's flags
The value returned in |
Dispose()
Dispose
Declaration
public void Dispose()
Dispose(Boolean)
For IDisposable
Declaration
protected virtual void Dispose(bool fDisposing)
Parameters
Type | Name | Description |
---|---|---|
System.Boolean | fDisposing |
Finalize()
For dispose
Declaration
protected void Finalize()
GetPriority()
Query the priority of this stream
Declaration
public int GetPriority()
Returns
Type | Description |
---|---|
System.Int32 | the stream's priority |
Query()
Returns true if all operations in the stream have completed, or false if not.
Declaration
public bool Query()
Returns
Type | Description |
---|---|
System.Boolean |
Synchronize()
Waits until the device has completed all operations in the stream. If the context was created with the BlockingSync flag, the CPU thread will block until the stream is finished with all of its tasks.
Declaration
public void Synchronize()
WaitEvent(CUevent)
Make a compute stream wait on an event
Makes all future work submitted to the Stream wait until hEvent
reports completion before beginning execution. This synchronization
will be performed efficiently on the device.
The stream will wait only for the completion of the most recent
host call to Record() on hEvent
. Once this call has returned,
any functions (including Record() and Dispose() may be
called on hEvent
again, and the subsequent calls will not have any
effect on this stream.
If hStream
is 0 (the NULL stream) any future work submitted in any stream
will wait for hEvent
to complete before beginning execution. This
effectively creates a barrier for all future work submitted to the context.
If Record() has not been called on hEvent
, this call acts as if
the record has already completed, and so is a functional no-op.
Declaration
public void WaitEvent(CUevent cuevent)
Parameters
Type | Name | Description |
---|---|---|
CUevent | cuevent |
WaitValue(CUdeviceptr, UInt32, CUstreamWaitValue_flags)
Wait on a memory location
Enqueues a synchronization of the stream on the given memory location. Work ordered after the operation will block until the given condition on the memory is satisfied. By default, the condition is to wait for (int32_t)(*addr - value) >= 0, a cyclic greater-or-equal.
Other condition types can be specified via \p flags.
If the memory was registered via ::cuMemHostRegister(), the device pointer should be obtained with::cuMemHostGetDevicePointer(). This function cannot be used with managed memory(::cuMemAllocManaged).
Support for this can be queried with ::cuDeviceGetAttribute() and ::CU_DEVICE_ATTRIBUTE_CAN_USE_STREAM_MEM_OPS. The only requirement for basic support is that on Windows, a device must be in TCC mode.
Declaration
public void WaitValue(CUdeviceptr addr, uint value, CUstreamWaitValue_flags flags)
Parameters
Type | Name | Description |
---|---|---|
CUdeviceptr | addr | The memory location to wait on. |
System.UInt32 | value | The value to compare with the memory location. |
CUstreamWaitValue_flags | flags | See::CUstreamWaitValue_flags. |
WaitValue(CUdeviceptr, UInt64, CUstreamWaitValue_flags)
Wait on a memory location
Enqueues a synchronization of the stream on the given memory location. Work ordered after the operation will block until the given condition on the memory is satisfied. By default, the condition is to wait for (int32_t)(*addr - value) >= 0, a cyclic greater-or-equal.
Other condition types can be specified via \p flags.
If the memory was registered via ::cuMemHostRegister(), the device pointer should be obtained with::cuMemHostGetDevicePointer(). This function cannot be used with managed memory(::cuMemAllocManaged).
Support for this can be queried with ::cuDeviceGetAttribute() and ::CU_DEVICE_ATTRIBUTE_CAN_USE_STREAM_MEM_OPS. The requirements are compute capability 7.0 or greater, and on Windows, that the device be in TCC mode.
Declaration
public void WaitValue(CUdeviceptr addr, ulong value, CUstreamWaitValue_flags flags)
Parameters
Type | Name | Description |
---|---|---|
CUdeviceptr | addr | The memory location to wait on. |
System.UInt64 | value | The value to compare with the memory location. |
CUstreamWaitValue_flags | flags | See::CUstreamWaitValue_flags. |
WriteValue(CUdeviceptr, UInt32, CUstreamWriteValue_flags)
Write a value to memory
Write a value to memory.Unless the ::CU_STREAM_WRITE_VALUE_NO_MEMORY_BARRIER flag is passed, the write is preceded by a system-wide memory fence, equivalent to a __threadfence_system() but scoped to the stream rather than a CUDA thread.
If the memory was registered via ::cuMemHostRegister(), the device pointer should be obtained with::cuMemHostGetDevicePointer(). This function cannot be used with managed memory(::cuMemAllocManaged).
Support for this can be queried with ::cuDeviceGetAttribute() and ::CU_DEVICE_ATTRIBUTE_CAN_USE_STREAM_MEM_OPS. The only requirement for basic support is that on Windows, a device must be in TCC mode.
Declaration
public void WriteValue(CUdeviceptr addr, uint value, CUstreamWriteValue_flags flags)
Parameters
Type | Name | Description |
---|---|---|
CUdeviceptr | addr | The device address to write to. |
System.UInt32 | value | The value to write. |
CUstreamWriteValue_flags | flags | See::CUstreamWriteValue_flags. |
WriteValue(CUdeviceptr, UInt64, CUstreamWriteValue_flags)
Write a value to memory
Write a value to memory.Unless the ::CU_STREAM_WRITE_VALUE_NO_MEMORY_BARRIER flag is passed, the write is preceded by a system-wide memory fence, equivalent to a __threadfence_system() but scoped to the stream rather than a CUDA thread.
If the memory was registered via ::cuMemHostRegister(), the device pointer should be obtained with::cuMemHostGetDevicePointer(). This function cannot be used with managed memory(::cuMemAllocManaged).
Support for this can be queried with ::cuDeviceGetAttribute() and ::CU_DEVICE_ATTRIBUTE_CAN_USE_STREAM_MEM_OPS. The requirements are compute capability 7.0 or greater, and on Windows, that the device be in TCC mode.
Declaration
public void WriteValue(CUdeviceptr addr, ulong value, CUstreamWriteValue_flags flags)
Parameters
Type | Name | Description |
---|---|---|
CUdeviceptr | addr | The device address to write to. |
System.UInt64 | value | The value to write. |
CUstreamWriteValue_flags | flags | See::CUstreamWriteValue_flags. |