Show / Hide Table of Contents

    Class DriverAPINativeMethods.Occupancy

    This section describes the occupancy calculation functions of the low-level CUDA driver application programming interface.

    Inheritance
    System.Object
    DriverAPINativeMethods.Occupancy
    Inherited Members
    System.Object.Equals(System.Object)
    System.Object.Equals(System.Object, System.Object)
    System.Object.GetHashCode()
    System.Object.GetType()
    System.Object.MemberwiseClone()
    System.Object.ReferenceEquals(System.Object, System.Object)
    System.Object.ToString()
    Namespace: ManagedCuda
    Assembly: ManagedCuda.dll
    Syntax
    public static class Occupancy

    Methods

    cuOccupancyMaxActiveBlocksPerMultiprocessor(ref Int32, CUfunction, Int32, SizeT)

    Returns in numBlocks the number of the maximum active blocks per streaming multiprocessor.

    Declaration
    public static CUResult cuOccupancyMaxActiveBlocksPerMultiprocessor(ref int numBlocks, CUfunction func, int blockSize, SizeT dynamicSMemSize)
    Parameters
    Type Name Description
    System.Int32 numBlocks

    Returned occupancy

    CUfunction func

    Kernel for which occupancy is calulated

    System.Int32 blockSize

    Block size the kernel is intended to be launched with

    SizeT dynamicSMemSize

    Per-block dynamic shared memory usage intended, in bytes

    Returns
    Type Description
    CUResult

    CUDA Error Codes: Success, ErrorDeinitialized, ErrorNotInitialized, ErrorInvalidContext, ErrorInvalidValue, ErrorUnknown.

    cuOccupancyMaxActiveBlocksPerMultiprocessorWithFlags(ref Int32, CUfunction, Int32, SizeT, CUoccupancy_flags)

    Returns occupancy of a function

    Returns in \p *numBlocks the number of the maximum active blocks per streaming multiprocessor.

    The \p Flags parameter controls how special cases are handled. The valid flags are:

    • ::CU_OCCUPANCY_DEFAULT, which maintains the default behavior as ::cuOccupancyMaxActiveBlocksPerMultiprocessor;
    • ::CU_OCCUPANCY_DISABLE_CACHING_OVERRIDE, which suppresses the default behavior on platform where global caching affects occupancy. On such platforms, if caching is enabled, but per-block SM resource usage would result in zero occupancy, the occupancy calculator will calculate the occupancy as if caching is disabled. Setting ::CU_OCCUPANCY_DISABLE_CACHING_OVERRIDE makes the occupancy calculator to return 0 in such cases. More information can be found about this feature in the "Unified L1/Texture Cache" section of the Maxwell tuning guide.
    Declaration
    public static CUResult cuOccupancyMaxActiveBlocksPerMultiprocessorWithFlags(ref int numBlocks, CUfunction func, int blockSize, SizeT dynamicSMemSize, CUoccupancy_flags flags)
    Parameters
    Type Name Description
    System.Int32 numBlocks

    Returned occupancy

    CUfunction func

    Kernel for which occupancy is calculated

    System.Int32 blockSize

    Block size the kernel is intended to be launched with

    SizeT dynamicSMemSize

    Per-block dynamic shared memory usage intended, in bytes

    CUoccupancy_flags flags

    Requested behavior for the occupancy calculator

    Returns
    Type Description
    CUResult

    cuOccupancyMaxPotentialBlockSize(ref Int32, ref Int32, CUfunction, del_CUoccupancyB2DSize, SizeT, Int32)

    Returns in blockSize a reasonable block size that can achieve the maximum occupancy (or, the maximum number of active warps with the fewest blocks per multiprocessor), and in minGridSize the minimum grid size to achieve the maximum occupancy.

    If blockSizeLimit is 0, the configurator will use the maximum block size permitted by the device / function instead.

    If per-block dynamic shared memory allocation is not needed, the user should leave both blockSizeToDynamicSMemSize and dynamicSMemSize as 0.

    If per-block dynamic shared memory allocation is needed, then if the dynamic shared memory size is constant regardless of block size, the size should be passed through dynamicSMemSize, and blockSizeToDynamicSMemSize should be NULL.

    Otherwise, if the per-block dynamic shared memory size varies with different block sizes, the user needs to provide a unary function through blockSizeToDynamicSMemSize that computes the dynamic shared memory needed by func for any given block size. dynamicSMemSize is ignored.

    Declaration
    public static CUResult cuOccupancyMaxPotentialBlockSize(ref int minGridSize, ref int blockSize, CUfunction func, del_CUoccupancyB2DSize blockSizeToDynamicSMemSize, SizeT dynamicSMemSize, int blockSizeLimit)
    Parameters
    Type Name Description
    System.Int32 minGridSize

    Returned minimum grid size needed to achieve the maximum occupancy

    System.Int32 blockSize

    Returned maximum block size that can achieve the maximum occupancy

    CUfunction func

    Kernel for which launch configuration is calulated

    del_CUoccupancyB2DSize blockSizeToDynamicSMemSize

    A function that calculates how much per-block dynamic shared memory \p func uses based on the block size

    SizeT dynamicSMemSize

    Dynamic shared memory usage intended, in bytes

    System.Int32 blockSizeLimit

    The maximum block size \p func is designed to handle

    Returns
    Type Description
    CUResult

    CUDA Error Codes: Success, ErrorDeinitialized, ErrorNotInitialized, ErrorInvalidContext, ErrorInvalidValue, ErrorUnknown.

    cuOccupancyMaxPotentialBlockSizeWithFlags(ref Int32, ref Int32, CUfunction, del_CUoccupancyB2DSize, SizeT, Int32, CUoccupancy_flags)

    Suggest a launch configuration with reasonable occupancy

    An extended version of ::cuOccupancyMaxPotentialBlockSize. In addition to arguments passed to ::cuOccupancyMaxPotentialBlockSize, ::cuOccupancyMaxPotentialBlockSizeWithFlags also takes a \p Flags parameter.

    The \p Flags parameter controls how special cases are handled. The valid flags are:

    • ::CU_OCCUPANCY_DEFAULT, which maintains the default behavior as ::cuOccupancyMaxPotentialBlockSize;
    • ::CU_OCCUPANCY_DISABLE_CACHING_OVERRIDE, which suppresses the default behavior on platform where global caching affects occupancy. On such platforms, the launch configurations that produces maximal occupancy might not support global caching. Setting ::CU_OCCUPANCY_DISABLE_CACHING_OVERRIDE guarantees that the the produced launch configuration is global caching compatible at a potential cost of occupancy. More information can be found about this feature in the "Unified L1/Texture Cache" section of the Maxwell tuning guide.
    Declaration
    public static CUResult cuOccupancyMaxPotentialBlockSizeWithFlags(ref int minGridSize, ref int blockSize, CUfunction func, del_CUoccupancyB2DSize blockSizeToDynamicSMemSize, SizeT dynamicSMemSize, int blockSizeLimit, CUoccupancy_flags flags)
    Parameters
    Type Name Description
    System.Int32 minGridSize

    Returned minimum grid size needed to achieve the maximum occupancy

    System.Int32 blockSize

    Returned maximum block size that can achieve the maximum occupancy

    CUfunction func

    Kernel for which launch configuration is calculated

    del_CUoccupancyB2DSize blockSizeToDynamicSMemSize

    A function that calculates how much per-block dynamic shared memory \p func uses based on the block size

    SizeT dynamicSMemSize

    Dynamic shared memory usage intended, in bytes

    System.Int32 blockSizeLimit

    The maximum block size \p func is designed to handle

    CUoccupancy_flags flags

    Options

    Returns
    Type Description
    CUResult
    • Improve this Doc
    • View Source
    Back to top Generated by DocFX