Show / Hide Table of Contents

    Class DriverAPINativeMethods.Limits

    Groups all context limit API calls

    Inheritance
    System.Object
    DriverAPINativeMethods.Limits
    Inherited Members
    System.Object.Equals(System.Object)
    System.Object.Equals(System.Object, System.Object)
    System.Object.GetHashCode()
    System.Object.GetType()
    System.Object.MemberwiseClone()
    System.Object.ReferenceEquals(System.Object, System.Object)
    System.Object.ToString()
    Namespace: ManagedCuda
    Assembly: ManagedCuda.dll
    Syntax
    public static class Limits

    Methods

    cuCtxGetLimit(ref SizeT, CULimit)

    Returns in pvalue the current size of limit. See CULimit

    Declaration
    public static CUResult cuCtxGetLimit(ref SizeT pvalue, CULimit limit)
    Parameters
    Type Name Description
    SizeT pvalue

    Returned size in bytes of limit

    CULimit limit

    Limit to query

    Returns
    Type Description
    CUResult

    CUDA Error Codes: Success, ErrorInvalidValue, ErrorUnsupportedLimit, . Note that this function may also return error codes from previous, asynchronous launches.

    cuCtxSetLimit(CULimit, SizeT)

    Setting limit to value is a request by the application to update the current limit maintained by the context. The driver is free to modify the requested value to meet h/w requirements (this could be clamping to minimum or maximum values, rounding up to nearest element size, etc). The application can use cuCtxGetLimit(ref SizeT, CULimit) to find out exactly what the limit has been set to.

    Setting each CULimit has its own specific restrictions, so each is discussed here:

    ValueRestriction
    StackSize StackSize controls the stack size of each GPU thread. This limit is only applicable to devices of compute capability 2.0 and higher. Attempting to set this limit on devices of compute capability less than 2.0 will result in the error ErrorUnsupportedLimit being returned.
    PrintfFIFOSize PrintfFIFOSize controls the size of the FIFO used by the printf() device system call. Setting PrintfFIFOSize must be performed before loading any module that uses the printf() device system call, otherwise ErrorInvalidValue will be returned. This limit is only applicable to devices of compute capability 2.0 and higher. Attempting to set this limit on devices of compute capability less than 2.0 will result in the error ErrorUnsupportedLimit being returned.
    MallocHeapSize MallocHeapSize controls the size in bytes of the heap used by the ::malloc() and ::free() device system calls. Setting MallocHeapSize must be performed before launching any kernel that uses the ::malloc() or ::free() device system calls, otherwise ErrorInvalidValue will be returned. This limit is only applicable to devices of compute capability 2.0 and higher. Attempting to set this limit on devices of compute capability less than 2.0 will result in the error ErrorUnsupportedLimit being returned.
    DevRuntimeSyncDepth DevRuntimeSyncDepth controls the maximum nesting depth of a grid at which a thread can safely call ::cudaDeviceSynchronize(). Setting this limit must be performed before any launch of a kernel that uses the device runtime and calls ::cudaDeviceSynchronize() above the default sync depth, two levels of grids. Calls to ::cudaDeviceSynchronize() will fail with error code ::cudaErrorSyncDepthExceeded if the limitation is violated. This limit can be set smaller than the default or up the maximum launch depth of 24. When setting this limit, keep in mind that additional levels of sync depth require the driver to reserve large amounts of device memory which can no longer be used for user allocations. If these reservations of device memory fail, ::cuCtxSetLimit will return ErrorOutOfMemory, and the limit can be reset to a lower value. This limit is only applicable to devices of compute capability 3.5 and higher. Attempting to set this limit on devices of compute capability less than 3.5 will result in the error ErrorUnsupportedLimit being returned.
    DevRuntimePendingLaunchCount DevRuntimePendingLaunchCount controls the maximum number of outstanding device runtime launches that can be made from the current context. A grid is outstanding from the point of launch up until the grid is known to have been completed. Device runtime launches which violate this limitation fail and return ::cudaErrorLaunchPendingCountExceeded when ::cudaGetLastError() is called after launch. If more pending launches than the default (2048 launches) are needed for a module using the device runtime, this limit can be increased. Keep in mind that being able to sustain additional pending launches will require the driver to reserve larger amounts of device memory upfront which can no longer be used for allocations. If these reservations fail, ::cuCtxSetLimit will return ErrorOutOfMemory, and the limit can be reset to a lower value. This limit is only applicable to devices of compute capability 3.5 and higher. Attempting to set this limit on devices of compute capability less than 3.5 will result in the error ErrorUnsupportedLimit being returned.
    Declaration
    public static CUResult cuCtxSetLimit(CULimit limit, SizeT value)
    Parameters
    Type Name Description
    CULimit limit

    Limit to set

    SizeT value

    Size in bytes of limit

    Returns
    Type Description
    CUResult

    CUDA Error Codes: Success, ErrorInvalidValue, ErrorUnsupportedLimit, . Note that this function may also return error codes from previous, asynchronous launches.

    • Improve this Doc
    • View Source
    Back to top Generated by DocFX