Class CudaDNNContext
An opaque structure holding the cuDNN library context.
The cuDNN library context must be created using cudnnCreate() and the returned handle must be passed to all subsequent library function calls. The context should be destroyed at the end using cudnnDestroy(). The context is associated with only one GPU device, the current device at the time of the call to cudnnCreate(). However multiple contexts can be created on the same GPU device.
Inheritance
Implements
Inherited Members
Namespace: ManagedCuda.CudaDNN
Assembly: CudaDNN.dll
Syntax
public class CudaDNNContext : IDisposable
Constructors
| Improve this Doc View SourceCudaDNNContext()
Declaration
public CudaDNNContext()
Properties
| Improve this Doc View SourceHandle
Returns the inner handle.
Declaration
public cudnnHandle Handle { get; }
Property Value
Type | Description |
---|---|
cudnnHandle |
Methods
| Improve this Doc View SourceActivationBackward(ActivationDescriptor, Double, TensorDescriptor, CudaDeviceVariable<Double>, TensorDescriptor, CudaDeviceVariable<Double>, TensorDescriptor, CudaDeviceVariable<Double>, Double, TensorDescriptor, CudaDeviceVariable<Double>)
This routine computes the gradient of a neuron activation function.
Declaration
public void ActivationBackward(ActivationDescriptor activationDesc, double alpha, TensorDescriptor yDesc, CudaDeviceVariable<double> y, TensorDescriptor dyDesc, CudaDeviceVariable<double> dy, TensorDescriptor xDesc, CudaDeviceVariable<double> x, double beta, TensorDescriptor dxDesc, CudaDeviceVariable<double> dx)
Parameters
Type | Name | Description |
---|---|---|
ActivationDescriptor | activationDesc | Handle to the previously created activation descriptor object. |
System.Double | alpha | Pointer to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | yDesc | Handle to the previously initialized input tensor descriptor. |
CudaDeviceVariable<System.Double> | y | Data pointer to GPU memory associated with the tensor descriptor srcDesc. |
TensorDescriptor | dyDesc | Handle to the previously initialized input differential tensor descriptor. |
CudaDeviceVariable<System.Double> | dy | Data pointer to GPU memory associated with the tensor descriptor srcDiffData. |
TensorDescriptor | xDesc | Handle to the previously initialized output tensor descriptor. |
CudaDeviceVariable<System.Double> | x | Data pointer to GPU memory associated with the output tensor descriptor destDesc. |
System.Double | beta | Pointer to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | dxDesc | Handle to the previously initialized output differential tensor descriptor. |
CudaDeviceVariable<System.Double> | dx | Data pointer to GPU memory associated with the output tensor descriptor destDiffDesc. |
ActivationBackward(ActivationDescriptor, Single, TensorDescriptor, CudaDeviceVariable<Single>, TensorDescriptor, CudaDeviceVariable<Single>, TensorDescriptor, CudaDeviceVariable<Single>, Single, TensorDescriptor, CudaDeviceVariable<Single>)
This routine computes the gradient of a neuron activation function.
Declaration
public void ActivationBackward(ActivationDescriptor activationDesc, float alpha, TensorDescriptor yDesc, CudaDeviceVariable<float> y, TensorDescriptor dyDesc, CudaDeviceVariable<float> dy, TensorDescriptor xDesc, CudaDeviceVariable<float> x, float beta, TensorDescriptor dxDesc, CudaDeviceVariable<float> dx)
Parameters
Type | Name | Description |
---|---|---|
ActivationDescriptor | activationDesc | Handle to the previously created activation descriptor object. |
System.Single | alpha | Pointer to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | yDesc | Handle to the previously initialized input tensor descriptor. |
CudaDeviceVariable<System.Single> | y | Data pointer to GPU memory associated with the tensor descriptor srcDesc. |
TensorDescriptor | dyDesc | Handle to the previously initialized input differential tensor descriptor. |
CudaDeviceVariable<System.Single> | dy | Data pointer to GPU memory associated with the tensor descriptor srcDiffData. |
TensorDescriptor | xDesc | Handle to the previously initialized output tensor descriptor. |
CudaDeviceVariable<System.Single> | x | Data pointer to GPU memory associated with the output tensor descriptor destDesc. |
System.Single | beta | Pointer to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | dxDesc | Handle to the previously initialized output differential tensor descriptor. |
CudaDeviceVariable<System.Single> | dx | Data pointer to GPU memory associated with the output tensor descriptor destDiffDesc. |
ActivationForward(ActivationDescriptor, Double, TensorDescriptor, CudaDeviceVariable<Double>, Double, TensorDescriptor, CudaDeviceVariable<Double>)
This routine applies a specified neuron activation function element-wise over each input value.
Declaration
public void ActivationForward(ActivationDescriptor activationDesc, double alpha, TensorDescriptor xDesc, CudaDeviceVariable<double> x, double beta, TensorDescriptor yDesc, CudaDeviceVariable<double> y)
Parameters
Type | Name | Description |
---|---|---|
ActivationDescriptor | activationDesc | Handle to the previously created activation descriptor object. |
System.Double | alpha | Pointer to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | xDesc | Handle to the previously initialized input tensor descriptor. |
CudaDeviceVariable<System.Double> | x | Data pointer to GPU memory associated with the tensor descriptor srcDesc. |
System.Double | beta | Pointer to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | yDesc | Handle to the previously initialized output tensor descriptor. |
CudaDeviceVariable<System.Double> | y | Data pointer to GPU memory associated with the output tensor descriptor destDesc. |
ActivationForward(ActivationDescriptor, Single, TensorDescriptor, CudaDeviceVariable<Single>, Single, TensorDescriptor, CudaDeviceVariable<Single>)
This routine applies a specified neuron activation function element-wise over each input value.
Declaration
public void ActivationForward(ActivationDescriptor activationDesc, float alpha, TensorDescriptor xDesc, CudaDeviceVariable<float> x, float beta, TensorDescriptor yDesc, CudaDeviceVariable<float> y)
Parameters
Type | Name | Description |
---|---|---|
ActivationDescriptor | activationDesc | Handle to the previously created activation descriptor object. |
System.Single | alpha | Pointer to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | xDesc | Handle to the previously initialized input tensor descriptor. |
CudaDeviceVariable<System.Single> | x | Data pointer to GPU memory associated with the tensor descriptor srcDesc. |
System.Single | beta | Pointer to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | yDesc | Handle to the previously initialized output tensor descriptor. |
CudaDeviceVariable<System.Single> | y | Data pointer to GPU memory associated with the output tensor descriptor destDesc. |
AddTensor(Double, TensorDescriptor, CudaDeviceVariable<Double>, Double, TensorDescriptor, CudaDeviceVariable<Double>)
This function adds the scaled values of one bias tensor to another tensor. Each dimension of the bias tensor must match the coresponding dimension of the srcDest tensor or must be equal to 1. In the latter case, the same value from the bias tensor for thoses dimensions will be used to blend into the srcDest tensor.
Declaration
public void AddTensor(double alpha, TensorDescriptor aDesc, CudaDeviceVariable<double> a, double beta, TensorDescriptor cDesc, CudaDeviceVariable<double> c)
Parameters
Type | Name | Description |
---|---|---|
System.Double | alpha | Pointer to scaling factors (in host memory) used to blend the source value with prior value in the destination tensor as follows: dstValue = alpha[0]*srcValue + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | aDesc | Handle to a previously initialized tensor descriptor. |
CudaDeviceVariable<System.Double> | a | Pointer to data of the tensor described by the biasDesc descriptor. |
System.Double | beta | Pointer to scaling factors (in host memory) used to blend the source value with prior value in the destination tensor as follows: dstValue = alpha[0]*srcValue + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | cDesc | Handle to a previously initialized tensor descriptor. |
CudaDeviceVariable<System.Double> | c | Pointer to data of the tensor described by the srcDestDesc descriptor. |
AddTensor(Single, TensorDescriptor, CudaDeviceVariable<Single>, Single, TensorDescriptor, CudaDeviceVariable<Single>)
This function adds the scaled values of one bias tensor to another tensor. Each dimension of the bias tensor must match the coresponding dimension of the srcDest tensor or must be equal to 1. In the latter case, the same value from the bias tensor for thoses dimensions will be used to blend into the srcDest tensor.
Declaration
public void AddTensor(float alpha, TensorDescriptor aDesc, CudaDeviceVariable<float> a, float beta, TensorDescriptor cDesc, CudaDeviceVariable<float> c)
Parameters
Type | Name | Description |
---|---|---|
System.Single | alpha | Pointer to scaling factors (in host memory) used to blend the source value with prior value in the destination tensor as follows: dstValue = alpha[0]*srcValue + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | aDesc | Handle to a previously initialized tensor descriptor. |
CudaDeviceVariable<System.Single> | a | Pointer to data of the tensor described by the biasDesc descriptor. |
System.Single | beta | Pointer to scaling factors (in host memory) used to blend the source value with prior value in the destination tensor as follows: dstValue = alpha[0]*srcValue + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | cDesc | Handle to a previously initialized tensor descriptor. |
CudaDeviceVariable<System.Single> | c | Pointer to data of the tensor described by the srcDestDesc descriptor. |
BatchNormalizationBackward(cudnnBatchNormMode, Double, Double, Double, Double, TensorDescriptor, CudaDeviceVariable<Double>, TensorDescriptor, CudaDeviceVariable<Double>, TensorDescriptor, CudaDeviceVariable<Double>, TensorDescriptor, CudaDeviceVariable<Double>, CudaDeviceVariable<Double>, CudaDeviceVariable<Double>, Double, CudaDeviceVariable<Double>, CudaDeviceVariable<Double>)
This function performs the backward BatchNormalization layer computation.
Declaration
public void BatchNormalizationBackward(cudnnBatchNormMode mode, double alphaDataDiff, double betaDataDiff, double alphaParamDiff, double betaParamDiff, TensorDescriptor xDesc, CudaDeviceVariable<double> x, TensorDescriptor dyDesc, CudaDeviceVariable<double> dy, TensorDescriptor dxDesc, CudaDeviceVariable<double> dx, TensorDescriptor dBnScaleBiasDesc, CudaDeviceVariable<double> bnScale, CudaDeviceVariable<double> dBnScaleResult, CudaDeviceVariable<double> dBnBiasResult, double epsilon, CudaDeviceVariable<double> savedMean, CudaDeviceVariable<double> savedInvVariance)
Parameters
Type | Name | Description |
---|---|---|
cudnnBatchNormMode | mode | Mode of operation (spatial or per-activation). |
System.Double | alphaDataDiff | Pointer to scaling factors in host memory used to blend the gradient output dx with a prior value in the destination tensor as follows: dstValue = alpha[0]*resultValue + beta[0]*priorDstValue. |
System.Double | betaDataDiff | Pointer to scaling factors in host memory used to blend the gradient output dx with a prior value in the destination tensor as follows: dstValue = alpha[0]*resultValue + beta[0]*priorDstValue. |
System.Double | alphaParamDiff | Pointer to scaling factors (in host memory) used to blend the gradient outputs dBnScaleResult and dBnBiasResult with prior values in the destination tensor as follows: dstValue = alpha[0]*resultValue + beta[0]*priorDstValue. |
System.Double | betaParamDiff | Pointer to scaling factors (in host memory) used to blend the gradient outputs dBnScaleResult and dBnBiasResult with prior values in the destination tensor as follows: dstValue = alpha[0]*resultValue + beta[0]*priorDstValue. |
TensorDescriptor | xDesc | Tensor descriptor for the layer's x data. |
CudaDeviceVariable<System.Double> | x | Pointers in device memory for the layer's x data. |
TensorDescriptor | dyDesc | Tensor descriptor for the layer's backpropagated differential dy (inputs). |
CudaDeviceVariable<System.Double> | dy | Pointers in device memory for the layer's backpropagated differential dy (inputs). |
TensorDescriptor | dxDesc | Tensor descriptor for the layer's resulting differential with respect to x, dx (output). |
CudaDeviceVariable<System.Double> | dx | Pointer in device memory for the layer's resulting differential with respect to x, dx (output). |
TensorDescriptor | dBnScaleBiasDesc | Shared tensor descriptor for all the 5 tensors below in the argument list (bnScale, resultBnScaleDiff, resultBnBiasDiff, savedMean, savedInvVariance). The dimensions for this tensor descriptor are dependent on normalization mode. Note: The data type of this tensor descriptor must be 'float' for FP16 and FP32 input tensors, and 'double' for FP64 input tensors. |
CudaDeviceVariable<System.Double> | bnScale | Pointers in device memory for the batch normalization scale parameter (in original paper bias is referred to as gamma). Note that bnBias parameter is not needed for this layer's computation. |
CudaDeviceVariable<System.Double> | dBnScaleResult | Pointer in device memory for the resulting scale differentials computed by this routine. Note that scale and bias gradients are not backpropagated below this layer (since they are dead-end computation DAG nodes). |
CudaDeviceVariable<System.Double> | dBnBiasResult | Pointer in device memory for the resulting bias differentials computed by this routine. Note that scale and bias gradients are not backpropagated below this layer (since they are dead-end computation DAG nodes). |
System.Double | epsilon | Epsilon value used in the batch normalization formula. Minimum allowed value is currently 1e-5. Same epsilon value should be used in forward and backward functions. |
CudaDeviceVariable<System.Double> | savedMean | Optional cache parameter saved intermediate results computed during the forward pass. For this to work correctly, the layer's x and bnScale, bnBias data has to remain unchanged until the backward function is called. Note that both savedMean and savedInvVariance parameters can be NULL but only at the same time. It is recommended to use this cache since the memory overhead is relatively small. |
CudaDeviceVariable<System.Double> | savedInvVariance | Optional cache parameter saved intermediate results computed during the forward pass. For this to work correctly, the layer's x and bnScale, bnBias data has to remain unchanged until the backward function is called. Note that both savedMean and savedInvVariance parameters can be NULL but only at the same time. It is recommended to use this cache since the memory overhead is relatively small. |
BatchNormalizationBackward(cudnnBatchNormMode, Single, Single, Single, Single, TensorDescriptor, CudaDeviceVariable<Single>, TensorDescriptor, CudaDeviceVariable<Single>, TensorDescriptor, CudaDeviceVariable<Single>, TensorDescriptor, CudaDeviceVariable<Single>, CudaDeviceVariable<Single>, CudaDeviceVariable<Single>, Double, CudaDeviceVariable<Single>, CudaDeviceVariable<Single>)
This function performs the backward BatchNormalization layer computation.
Declaration
public void BatchNormalizationBackward(cudnnBatchNormMode mode, float alphaDataDiff, float betaDataDiff, float alphaParamDiff, float betaParamDiff, TensorDescriptor xDesc, CudaDeviceVariable<float> x, TensorDescriptor dyDesc, CudaDeviceVariable<float> dy, TensorDescriptor dxDesc, CudaDeviceVariable<float> dx, TensorDescriptor dBnScaleBiasDesc, CudaDeviceVariable<float> bnScale, CudaDeviceVariable<float> dBnScaleResult, CudaDeviceVariable<float> dBnBiasResult, double epsilon, CudaDeviceVariable<float> savedMean, CudaDeviceVariable<float> savedInvVariance)
Parameters
Type | Name | Description |
---|---|---|
cudnnBatchNormMode | mode | Mode of operation (spatial or per-activation). |
System.Single | alphaDataDiff | Pointer to scaling factors in host memory used to blend the gradient output dx with a prior value in the destination tensor as follows: dstValue = alpha[0]*resultValue + beta[0]*priorDstValue. |
System.Single | betaDataDiff | Pointer to scaling factors in host memory used to blend the gradient output dx with a prior value in the destination tensor as follows: dstValue = alpha[0]*resultValue + beta[0]*priorDstValue. |
System.Single | alphaParamDiff | Pointer to scaling factors (in host memory) used to blend the gradient outputs dBnScaleResult and dBnBiasResult with prior values in the destination tensor as follows: dstValue = alpha[0]*resultValue + beta[0]*priorDstValue. |
System.Single | betaParamDiff | Pointer to scaling factors (in host memory) used to blend the gradient outputs dBnScaleResult and dBnBiasResult with prior values in the destination tensor as follows: dstValue = alpha[0]*resultValue + beta[0]*priorDstValue. |
TensorDescriptor | xDesc | Tensor descriptor for the layer's x data. |
CudaDeviceVariable<System.Single> | x | Pointers in device memory for the layer's x data. |
TensorDescriptor | dyDesc | Tensor descriptor for the layer's backpropagated differential dy (inputs). |
CudaDeviceVariable<System.Single> | dy | Pointers in device memory for the layer's backpropagated differential dy (inputs). |
TensorDescriptor | dxDesc | Tensor descriptor for the layer's resulting differential with respect to x, dx (output). |
CudaDeviceVariable<System.Single> | dx | Pointer in device memory for the layer's resulting differential with respect to x, dx (output). |
TensorDescriptor | dBnScaleBiasDesc | Shared tensor descriptor for all the 5 tensors below in the argument list (bnScale, resultBnScaleDiff, resultBnBiasDiff, savedMean, savedInvVariance). The dimensions for this tensor descriptor are dependent on normalization mode. Note: The data type of this tensor descriptor must be 'float' for FP16 and FP32 input tensors, and 'double' for FP64 input tensors. |
CudaDeviceVariable<System.Single> | bnScale | Pointers in device memory for the batch normalization scale parameter (in original paper bias is referred to as gamma). Note that bnBias parameter is not needed for this layer's computation. |
CudaDeviceVariable<System.Single> | dBnScaleResult | Pointer in device memory for the resulting scale differentials computed by this routine. Note that scale and bias gradients are not backpropagated below this layer (since they are dead-end computation DAG nodes). |
CudaDeviceVariable<System.Single> | dBnBiasResult | Pointer in device memory for the resulting bias differentials computed by this routine. Note that scale and bias gradients are not backpropagated below this layer (since they are dead-end computation DAG nodes). |
System.Double | epsilon | Epsilon value used in the batch normalization formula. Minimum allowed value is currently 1e-5. Same epsilon value should be used in forward and backward functions. |
CudaDeviceVariable<System.Single> | savedMean | Optional cache parameter saved intermediate results computed during the forward pass. For this to work correctly, the layer's x and bnScale, bnBias data has to remain unchanged until the backward function is called. Note that both savedMean and savedInvVariance parameters can be NULL but only at the same time. It is recommended to use this cache since the memory overhead is relatively small. |
CudaDeviceVariable<System.Single> | savedInvVariance | Optional cache parameter saved intermediate results computed during the forward pass. For this to work correctly, the layer's x and bnScale, bnBias data has to remain unchanged until the backward function is called. Note that both savedMean and savedInvVariance parameters can be NULL but only at the same time. It is recommended to use this cache since the memory overhead is relatively small. |
BatchNormalizationForwardInference(cudnnBatchNormMode, Double, Double, TensorDescriptor, CudaDeviceVariable<Double>, TensorDescriptor, CudaDeviceVariable<Double>, TensorDescriptor, CudaDeviceVariable<Double>, CudaDeviceVariable<Double>, CudaDeviceVariable<Double>, CudaDeviceVariable<Double>, Double)
This function performs the forward BatchNormalization layer computation for the inference phase. This layer is based on the paper "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift", S. Ioffe, C. Szegedy, 2015.
Declaration
public void BatchNormalizationForwardInference(cudnnBatchNormMode mode, double alpha, double beta, TensorDescriptor xDesc, CudaDeviceVariable<double> x, TensorDescriptor yDesc, CudaDeviceVariable<double> y, TensorDescriptor bnScaleBiasMeanVarDesc, CudaDeviceVariable<double> bnScale, CudaDeviceVariable<double> bnBias, CudaDeviceVariable<double> estimatedMean, CudaDeviceVariable<double> estimatedVariance, double epsilon)
Parameters
Type | Name | Description |
---|---|---|
cudnnBatchNormMode | mode | Mode of operation (spatial or per-activation). |
System.Double | alpha | Pointer to scaling factors (in host memory) used to blend the layer output value with prior value in the destination tensor as follows: dstValue = alpha[0]*resultValue + beta[0]*priorDstValue. |
System.Double | beta | Pointer to scaling factors (in host memory) used to blend the layer output value with prior value in the destination tensor as follows: dstValue = alpha[0]*resultValue + beta[0]*priorDstValue. |
TensorDescriptor | xDesc | Tensor descriptor layer's x data. |
CudaDeviceVariable<System.Double> | x | Pointer in device memory for the layer's x data. |
TensorDescriptor | yDesc | Tensor descriptor the layer's y data. |
CudaDeviceVariable<System.Double> | y | Pointer in device memory for the layer's y data. |
TensorDescriptor | bnScaleBiasMeanVarDesc | Shared tensor descriptor desc for all the 4 tensors below in the argument list. The dimensions for this tensor descriptor are dependent on the normalization mode. |
CudaDeviceVariable<System.Double> | bnScale | Pointer in device memory for the batch normalization scale parameters (in original paper scale is referred to as gamma). |
CudaDeviceVariable<System.Double> | bnBias | Pointers in device memory for the batch normalization bias parameters (in original paper bias is referred to as beta). Note that bnBias parameter can replace the previous layer's bias parameter for improved efficiency. |
CudaDeviceVariable<System.Double> | estimatedMean | Mean tensor (has the same descriptor as the bias and scale). It is suggested that resultRunningMean from the cudnnBatchNormalizationForwardTraining call accumulated during the training phase be passed as input here. |
CudaDeviceVariable<System.Double> | estimatedVariance | Variance tensor (has the same descriptor as the bias and scale). It is suggested that resultRunningVariance from the cudnnBatchNormalizationForwardTraining call accumulated during the training phase be passed as input here. |
System.Double | epsilon | Epsilon value used in the batch normalization formula. Minimum allowed value is currently 1e-5. Same epsilon value should be used in forward and backward functions. |
BatchNormalizationForwardInference(cudnnBatchNormMode, Single, Single, TensorDescriptor, CudaDeviceVariable<Single>, TensorDescriptor, CudaDeviceVariable<Single>, TensorDescriptor, CudaDeviceVariable<Single>, CudaDeviceVariable<Single>, CudaDeviceVariable<Single>, CudaDeviceVariable<Single>, Double)
This function performs the forward BatchNormalization layer computation for the inference phase. This layer is based on the paper "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift", S. Ioffe, C. Szegedy, 2015.
Declaration
public void BatchNormalizationForwardInference(cudnnBatchNormMode mode, float alpha, float beta, TensorDescriptor xDesc, CudaDeviceVariable<float> x, TensorDescriptor yDesc, CudaDeviceVariable<float> y, TensorDescriptor bnScaleBiasMeanVarDesc, CudaDeviceVariable<float> bnScale, CudaDeviceVariable<float> bnBias, CudaDeviceVariable<float> estimatedMean, CudaDeviceVariable<float> estimatedVariance, double epsilon)
Parameters
Type | Name | Description |
---|---|---|
cudnnBatchNormMode | mode | Mode of operation (spatial or per-activation). |
System.Single | alpha | Pointer to scaling factors (in host memory) used to blend the layer output value with prior value in the destination tensor as follows: dstValue = alpha[0]*resultValue + beta[0]*priorDstValue. |
System.Single | beta | Pointer to scaling factors (in host memory) used to blend the layer output value with prior value in the destination tensor as follows: dstValue = alpha[0]*resultValue + beta[0]*priorDstValue. |
TensorDescriptor | xDesc | Tensor descriptor layer's x data. |
CudaDeviceVariable<System.Single> | x | Pointer in device memory for the layer's x data. |
TensorDescriptor | yDesc | Tensor descriptor the layer's y data. |
CudaDeviceVariable<System.Single> | y | Pointer in device memory for the layer's y data. |
TensorDescriptor | bnScaleBiasMeanVarDesc | Shared tensor descriptor desc for all the 4 tensors below in the argument list. The dimensions for this tensor descriptor are dependent on the normalization mode. |
CudaDeviceVariable<System.Single> | bnScale | Pointer in device memory for the batch normalization scale parameters (in original paper scale is referred to as gamma). |
CudaDeviceVariable<System.Single> | bnBias | Pointers in device memory for the batch normalization bias parameters (in original paper bias is referred to as beta). Note that bnBias parameter can replace the previous layer's bias parameter for improved efficiency. |
CudaDeviceVariable<System.Single> | estimatedMean | Mean tensor (has the same descriptor as the bias and scale). It is suggested that resultRunningMean from the cudnnBatchNormalizationForwardTraining call accumulated during the training phase be passed as input here. |
CudaDeviceVariable<System.Single> | estimatedVariance | Variance tensor (has the same descriptor as the bias and scale). It is suggested that resultRunningVariance from the cudnnBatchNormalizationForwardTraining call accumulated during the training phase be passed as input here. |
System.Double | epsilon | Epsilon value used in the batch normalization formula. Minimum allowed value is currently 1e-5. Same epsilon value should be used in forward and backward functions. |
BatchNormalizationForwardTraining(cudnnBatchNormMode, Double, Double, TensorDescriptor, CudaDeviceVariable<Double>, TensorDescriptor, CudaDeviceVariable<Double>, TensorDescriptor, CudaDeviceVariable<Double>, CudaDeviceVariable<Double>, Double, CudaDeviceVariable<Double>, CudaDeviceVariable<Double>, Double, CudaDeviceVariable<Double>, CudaDeviceVariable<Double>)
This function performs the forward BatchNormalization layer computation for the training phase. This layer is based on the paper "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift", S. Ioffe, C. Szegedy, 2015.
Declaration
public void BatchNormalizationForwardTraining(cudnnBatchNormMode mode, double alpha, double beta, TensorDescriptor xDesc, CudaDeviceVariable<double> x, TensorDescriptor yDesc, CudaDeviceVariable<double> y, TensorDescriptor bnScaleBiasMeanVarDesc, CudaDeviceVariable<double> bnScale, CudaDeviceVariable<double> bnBias, double exponentialAverageFactor, CudaDeviceVariable<double> resultRunningMean, CudaDeviceVariable<double> resultRunningVariance, double epsilon, CudaDeviceVariable<double> resultSaveMean, CudaDeviceVariable<double> resultSaveVariance)
Parameters
Type | Name | Description |
---|---|---|
cudnnBatchNormMode | mode | Mode of operation (spatial or per-activation). |
System.Double | alpha | Pointer to scaling factors (in host memory) used to blend the layer output value with prior value in the destination tensor as follows: dstValue = alpha[0]*resultValue + beta[0]*priorDstValue. |
System.Double | beta | Pointer to scaling factors (in host memory) used to blend the layer output value with prior value in the destination tensor as follows: dstValue = alpha[0]*resultValue + beta[0]*priorDstValue. |
TensorDescriptor | xDesc | Tensor descriptor layer's x data. |
CudaDeviceVariable<System.Double> | x | Pointer in device memory for the layer's x data. |
TensorDescriptor | yDesc | Tensor descriptor the layer's y data. |
CudaDeviceVariable<System.Double> | y | Pointer in device memory for the layer's y data. |
TensorDescriptor | bnScaleBiasMeanVarDesc | Shared tensor descriptor desc for all the 6 tensors below in the argument list. The dimensions for this tensor descriptor are dependent on the normalization mode. |
CudaDeviceVariable<System.Double> | bnScale | Pointer in device memory for the batch normalization scale parameters (in original paper scale is referred to as gamma). |
CudaDeviceVariable<System.Double> | bnBias | Pointers in device memory for the batch normalization bias parameters (in original paper bias is referred to as beta). Note that bnBias parameter can replace the previous layer's bias parameter for improved efficiency. |
System.Double | exponentialAverageFactor | Factor used in the moving average computation runningMean = newMeanfactor + runningMean(1-factor). Use a factor=1/(1+n) at Nth call to the function to get Cumulative Moving Average (CMA) behavior CMA[n] = (x[1]+...+x[n])/n. Since CMA[n+1] = (n*CMA[n]+x[n+1])/(n+1)= ((n+1)CMA[n]-CMA[n])/(n+1) + x[n+1]/(n+1) = CMA[n](1-1/(n+1))+x[n +1]*1/(n+1) |
CudaDeviceVariable<System.Double> | resultRunningMean | Running mean tensor (it has the same descriptor as the bias and scale). If this tensor is initially uninitialized, it is required that exponentialAverageFactor=1 is used for the very first call of a complete training cycle. This is necessary to properly initialize the moving average. Both resultRunningMean and resultRunningInvVariance can be NULL but only at the same time. |
CudaDeviceVariable<System.Double> | resultRunningVariance | Running variance tensor (it has the same descriptor as the bias and scale). If this tensors is initially uninitialized, it is required that exponentialAverageFactor=1 is used for the very first call of a complete training cycle. This is necessary to properly initialize the moving average. Both resultRunningMean and resultRunningInvVariance can be NULL but only at the same time. The value stored in resultRunningInvVariance (or passed as an input in inference mode) is the moving average of the expression 1 / sqrt(eps+variance[x]) where variance is computed either over batch or spatial+batch dimensions depending on the mode. |
System.Double | epsilon | Epsilon value used in the batch normalization formula. Minimum allowed value is currently 1e-5. Same epsilon value should be used in forward and backward functions. |
CudaDeviceVariable<System.Double> | resultSaveMean | Optional cache to save intermediate results computed during the forward pass - these can then be reused to speed up the backward pass. For this to work correctly, the bottom layer data has to remain unchanged until the backward function is called. Note that both resultSaveMean and resultSaveInvVariance can be NULL but only at the same time. It is recommended to use this cache since memory overhead is relatively small because these tensors have a much lower product of dimensions than the data tensors. |
CudaDeviceVariable<System.Double> | resultSaveVariance | Optional cache to save intermediate results computed during the forward pass - these can then be reused to speed up the backward pass. For this to work correctly, the bottom layer data has to remain unchanged until the backward function is called. Note that both resultSaveMean and resultSaveInvVariance can be NULL but only at the same time. It is recommended to use this cache since memory overhead is relatively small because these tensors have a much lower product of dimensions than the data tensors. |
BatchNormalizationForwardTraining(cudnnBatchNormMode, Single, Single, TensorDescriptor, CudaDeviceVariable<Single>, TensorDescriptor, CudaDeviceVariable<Single>, TensorDescriptor, CudaDeviceVariable<Single>, CudaDeviceVariable<Single>, Double, CudaDeviceVariable<Single>, CudaDeviceVariable<Single>, Double, CudaDeviceVariable<Single>, CudaDeviceVariable<Single>)
This function performs the forward BatchNormalization layer computation for the training phase. This layer is based on the paper "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift", S. Ioffe, C. Szegedy, 2015.
Declaration
public void BatchNormalizationForwardTraining(cudnnBatchNormMode mode, float alpha, float beta, TensorDescriptor xDesc, CudaDeviceVariable<float> x, TensorDescriptor yDesc, CudaDeviceVariable<float> y, TensorDescriptor bnScaleBiasMeanVarDesc, CudaDeviceVariable<float> bnScale, CudaDeviceVariable<float> bnBias, double exponentialAverageFactor, CudaDeviceVariable<float> resultRunningMean, CudaDeviceVariable<float> resultRunningVariance, double epsilon, CudaDeviceVariable<float> resultSaveMean, CudaDeviceVariable<float> resultSaveVariance)
Parameters
Type | Name | Description |
---|---|---|
cudnnBatchNormMode | mode | Mode of operation (spatial or per-activation). |
System.Single | alpha | Pointer to scaling factors (in host memory) used to blend the layer output value with prior value in the destination tensor as follows: dstValue = alpha[0]*resultValue + beta[0]*priorDstValue. |
System.Single | beta | Pointer to scaling factors (in host memory) used to blend the layer output value with prior value in the destination tensor as follows: dstValue = alpha[0]*resultValue + beta[0]*priorDstValue. |
TensorDescriptor | xDesc | Tensor descriptor layer's x data. |
CudaDeviceVariable<System.Single> | x | Pointer in device memory for the layer's x data. |
TensorDescriptor | yDesc | Tensor descriptor the layer's y data. |
CudaDeviceVariable<System.Single> | y | Pointer in device memory for the layer's y data. |
TensorDescriptor | bnScaleBiasMeanVarDesc | Shared tensor descriptor desc for all the 6 tensors below in the argument list. The dimensions for this tensor descriptor are dependent on the normalization mode. |
CudaDeviceVariable<System.Single> | bnScale | Pointer in device memory for the batch normalization scale parameters (in original paper scale is referred to as gamma). |
CudaDeviceVariable<System.Single> | bnBias | Pointers in device memory for the batch normalization bias parameters (in original paper bias is referred to as beta). Note that bnBias parameter can replace the previous layer's bias parameter for improved efficiency. |
System.Double | exponentialAverageFactor | Factor used in the moving average computation runningMean = newMeanfactor + runningMean(1-factor). Use a factor=1/(1+n) at Nth call to the function to get Cumulative Moving Average (CMA) behavior CMA[n] = (x[1]+...+x[n])/n. Since CMA[n+1] = (n*CMA[n]+x[n+1])/(n+1)= ((n+1)CMA[n]-CMA[n])/(n+1) + x[n+1]/(n+1) = CMA[n](1-1/(n+1))+x[n +1]*1/(n+1) |
CudaDeviceVariable<System.Single> | resultRunningMean | Running mean tensor (it has the same descriptor as the bias and scale). If this tensor is initially uninitialized, it is required that exponentialAverageFactor=1 is used for the very first call of a complete training cycle. This is necessary to properly initialize the moving average. Both resultRunningMean and resultRunningInvVariance can be NULL but only at the same time. |
CudaDeviceVariable<System.Single> | resultRunningVariance | Running variance tensor (it has the same descriptor as the bias and scale). If this tensors is initially uninitialized, it is required that exponentialAverageFactor=1 is used for the very first call of a complete training cycle. This is necessary to properly initialize the moving average. Both resultRunningMean and resultRunningInvVariance can be NULL but only at the same time. The value stored in resultRunningInvVariance (or passed as an input in inference mode) is the moving average of the expression 1 / sqrt(eps+variance[x]) where variance is computed either over batch or spatial+batch dimensions depending on the mode. |
System.Double | epsilon | Epsilon value used in the batch normalization formula. Minimum allowed value is currently 1e-5. Same epsilon value should be used in forward and backward functions. |
CudaDeviceVariable<System.Single> | resultSaveMean | Optional cache to save intermediate results computed during the forward pass - these can then be reused to speed up the backward pass. For this to work correctly, the bottom layer data has to remain unchanged until the backward function is called. Note that both resultSaveMean and resultSaveInvVariance can be NULL but only at the same time. It is recommended to use this cache since memory overhead is relatively small because these tensors have a much lower product of dimensions than the data tensors. |
CudaDeviceVariable<System.Single> | resultSaveVariance | Optional cache to save intermediate results computed during the forward pass - these can then be reused to speed up the backward pass. For this to work correctly, the bottom layer data has to remain unchanged until the backward function is called. Note that both resultSaveMean and resultSaveInvVariance can be NULL but only at the same time. It is recommended to use this cache since memory overhead is relatively small because these tensors have a much lower product of dimensions than the data tensors. |
ConvolutionBackwardBias(Double, TensorDescriptor, CudaDeviceVariable<Double>, Double, TensorDescriptor, CudaDeviceVariable<Double>)
This function computes the convolution gradient with respect to the bias, which is the sum of every element belonging to the same feature map across all of the images of the input tensor. Therefore, the number of elements produced is equal to the number of features maps of the input tensor.
Declaration
public void ConvolutionBackwardBias(double alpha, TensorDescriptor dyDesc, CudaDeviceVariable<double> dy, double beta, TensorDescriptor dbDesc, CudaDeviceVariable<double> db)
Parameters
Type | Name | Description |
---|---|---|
System.Double | alpha | Pointer to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | dyDesc | Handle to the previously initialized input tensor descriptor. |
CudaDeviceVariable<System.Double> | dy | Data pointer to GPU memory associated with the tensor descriptor srcDesc. |
System.Double | beta | Pointer to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | dbDesc | Handle to the previously initialized output tensor descriptor. |
CudaDeviceVariable<System.Double> | db | Data pointer to GPU memory associated with the output tensor descriptor destDesc. |
ConvolutionBackwardBias(Single, TensorDescriptor, CudaDeviceVariable<Single>, Single, TensorDescriptor, CudaDeviceVariable<Single>)
This function computes the convolution gradient with respect to the bias, which is the sum of every element belonging to the same feature map across all of the images of the input tensor. Therefore, the number of elements produced is equal to the number of features maps of the input tensor.
Declaration
public void ConvolutionBackwardBias(float alpha, TensorDescriptor dyDesc, CudaDeviceVariable<float> dy, float beta, TensorDescriptor dbDesc, CudaDeviceVariable<float> db)
Parameters
Type | Name | Description |
---|---|---|
System.Single | alpha | Pointer to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | dyDesc | Handle to the previously initialized input tensor descriptor. |
CudaDeviceVariable<System.Single> | dy | Data pointer to GPU memory associated with the tensor descriptor srcDesc. |
System.Single | beta | Pointer to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | dbDesc | Handle to the previously initialized output tensor descriptor. |
CudaDeviceVariable<System.Single> | db | Data pointer to GPU memory associated with the output tensor descriptor destDesc. |
ConvolutionBackwardData(Double, FilterDescriptor, CudaDeviceVariable<Double>, TensorDescriptor, CudaDeviceVariable<Double>, ConvolutionDescriptor, cudnnConvolutionBwdDataAlgo, CudaDeviceVariable<Byte>, Double, TensorDescriptor, CudaDeviceVariable<Double>)
This function computes the convolution gradient with respect to the output tensor using the specified algo, returning results in gradDesc. Scaling factors alpha and beta can be used to scale the input tensor and the output tensor respectively.
Declaration
public void ConvolutionBackwardData(double alpha, FilterDescriptor wDesc, CudaDeviceVariable<double> w, TensorDescriptor dyDesc, CudaDeviceVariable<double> dy, ConvolutionDescriptor convDesc, cudnnConvolutionBwdDataAlgo algo, CudaDeviceVariable<byte> workSpace, double beta, TensorDescriptor dxDesc, CudaDeviceVariable<double> dx)
Parameters
Type | Name | Description |
---|---|---|
System.Double | alpha | Pointer to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for additional details. |
FilterDescriptor | wDesc | Handle to a previously initialized filter descriptor. |
CudaDeviceVariable<System.Double> | w | Data pointer to GPU memory associated with the filter descriptor filterDesc. |
TensorDescriptor | dyDesc | Handle to the previously initialized input differential tensor descriptor. |
CudaDeviceVariable<System.Double> | dy | Data pointer to GPU memory associated with the input differential tensor descriptor diffDesc. |
ConvolutionDescriptor | convDesc | Previously initialized convolution descriptor. |
cudnnConvolutionBwdDataAlgo | algo | Enumerant that specifies which backward data convolution algorithm shoud be used to compute the results |
CudaDeviceVariable<System.Byte> | workSpace | Data pointer to GPU memory to a workspace needed to able to execute the specified algorithm. If no workspace is needed for a particular algorithm, that pointer can be nil |
System.Double | beta | Pointer to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | dxDesc | Handle to the previously initialized output tensor descriptor. |
CudaDeviceVariable<System.Double> | dx | Data pointer to GPU memory associated with the output tensor descriptor gradDesc that carries the result. |
ConvolutionBackwardData(Single, FilterDescriptor, CudaDeviceVariable<Single>, TensorDescriptor, CudaDeviceVariable<Single>, ConvolutionDescriptor, cudnnConvolutionBwdDataAlgo, CudaDeviceVariable<Byte>, Single, TensorDescriptor, CudaDeviceVariable<Single>)
This function computes the convolution gradient with respect to the output tensor using the specified algo, returning results in gradDesc. Scaling factors alpha and beta can be used to scale the input tensor and the output tensor respectively.
Declaration
public void ConvolutionBackwardData(float alpha, FilterDescriptor wDesc, CudaDeviceVariable<float> w, TensorDescriptor dyDesc, CudaDeviceVariable<float> dy, ConvolutionDescriptor convDesc, cudnnConvolutionBwdDataAlgo algo, CudaDeviceVariable<byte> workSpace, float beta, TensorDescriptor dxDesc, CudaDeviceVariable<float> dx)
Parameters
Type | Name | Description |
---|---|---|
System.Single | alpha | Pointer to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for additional details. |
FilterDescriptor | wDesc | Handle to a previously initialized filter descriptor. |
CudaDeviceVariable<System.Single> | w | Data pointer to GPU memory associated with the filter descriptor filterDesc. |
TensorDescriptor | dyDesc | Handle to the previously initialized input differential tensor descriptor. |
CudaDeviceVariable<System.Single> | dy | Data pointer to GPU memory associated with the input differential tensor descriptor diffDesc. |
ConvolutionDescriptor | convDesc | Previously initialized convolution descriptor. |
cudnnConvolutionBwdDataAlgo | algo | Enumerant that specifies which backward data convolution algorithm shoud be used to compute the results |
CudaDeviceVariable<System.Byte> | workSpace | Data pointer to GPU memory to a workspace needed to able to execute the specified algorithm. If no workspace is needed for a particular algorithm, that pointer can be nil |
System.Single | beta | Pointer to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | dxDesc | Handle to the previously initialized output tensor descriptor. |
CudaDeviceVariable<System.Single> | dx | Data pointer to GPU memory associated with the output tensor descriptor gradDesc that carries the result. |
ConvolutionBackwardFilter(Double, TensorDescriptor, CudaDeviceVariable<Double>, TensorDescriptor, CudaDeviceVariable<Double>, ConvolutionDescriptor, cudnnConvolutionBwdFilterAlgo, CudaDeviceVariable<Byte>, Double, FilterDescriptor, CudaDeviceVariable<Double>)
This function computes the convolution gradient with respect to filter coefficients using the specified algo, returning results in gradDesc.Scaling factors alpha and beta can be used to scale the input tensor and the output tensor respectively.
Declaration
public void ConvolutionBackwardFilter(double alpha, TensorDescriptor xDesc, CudaDeviceVariable<double> x, TensorDescriptor dyDesc, CudaDeviceVariable<double> dy, ConvolutionDescriptor convDesc, cudnnConvolutionBwdFilterAlgo algo, CudaDeviceVariable<byte> workSpace, double beta, FilterDescriptor dwDesc, CudaDeviceVariable<double> dw)
Parameters
Type | Name | Description |
---|---|---|
System.Double | alpha | Pointer to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | xDesc | Handle to a previously initialized tensor descriptor. |
CudaDeviceVariable<System.Double> | x | Data pointer to GPU memory associated with the tensor descriptor srcDesc. |
TensorDescriptor | dyDesc | Handle to the previously initialized input differential tensor descriptor. |
CudaDeviceVariable<System.Double> | dy | Data pointer to GPU memory associated with the input differential tensor descriptor diffDesc. |
ConvolutionDescriptor | convDesc | Previously initialized convolution descriptor. |
cudnnConvolutionBwdFilterAlgo | algo | Enumerant that specifies which convolution algorithm shoud be used to compute the results |
CudaDeviceVariable<System.Byte> | workSpace | Data pointer to GPU memory to a workspace needed to able to execute the specified algorithm. If no workspace is needed for a particular algorithm, that pointer can be nil |
System.Double | beta | Pointer to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for additional details. |
FilterDescriptor | dwDesc | Handle to a previously initialized filter descriptor. |
CudaDeviceVariable<System.Double> | dw | Data pointer to GPU memory associated with the filter descriptor gradDesc that carries the result. |
ConvolutionBackwardFilter(Single, TensorDescriptor, CudaDeviceVariable<Single>, TensorDescriptor, CudaDeviceVariable<Single>, ConvolutionDescriptor, cudnnConvolutionBwdFilterAlgo, CudaDeviceVariable<Byte>, Single, FilterDescriptor, CudaDeviceVariable<Single>)
This function computes the convolution gradient with respect to filter coefficients using the specified algo, returning results in gradDesc.Scaling factors alpha and beta can be used to scale the input tensor and the output tensor respectively.
Declaration
public void ConvolutionBackwardFilter(float alpha, TensorDescriptor xDesc, CudaDeviceVariable<float> x, TensorDescriptor dyDesc, CudaDeviceVariable<float> dy, ConvolutionDescriptor convDesc, cudnnConvolutionBwdFilterAlgo algo, CudaDeviceVariable<byte> workSpace, float beta, FilterDescriptor dwDesc, CudaDeviceVariable<float> dw)
Parameters
Type | Name | Description |
---|---|---|
System.Single | alpha | Pointer to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | xDesc | Handle to a previously initialized tensor descriptor. |
CudaDeviceVariable<System.Single> | x | Data pointer to GPU memory associated with the tensor descriptor srcDesc. |
TensorDescriptor | dyDesc | Handle to the previously initialized input differential tensor descriptor. |
CudaDeviceVariable<System.Single> | dy | Data pointer to GPU memory associated with the input differential tensor descriptor diffDesc. |
ConvolutionDescriptor | convDesc | Previously initialized convolution descriptor. |
cudnnConvolutionBwdFilterAlgo | algo | Enumerant that specifies which convolution algorithm shoud be used to compute the results |
CudaDeviceVariable<System.Byte> | workSpace | Data pointer to GPU memory to a workspace needed to able to execute the specified algorithm. If no workspace is needed for a particular algorithm, that pointer can be nil |
System.Single | beta | Pointer to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for additional details. |
FilterDescriptor | dwDesc | Handle to a previously initialized filter descriptor. |
CudaDeviceVariable<System.Single> | dw | Data pointer to GPU memory associated with the filter descriptor gradDesc that carries the result. |
ConvolutionBiasActivationForward(Double, TensorDescriptor, CudaDeviceVariable<Double>, FilterDescriptor, CudaDeviceVariable<Double>, ConvolutionDescriptor, cudnnConvolutionFwdAlgo, CudaDeviceVariable<Byte>, Double, TensorDescriptor, CudaDeviceVariable<Double>, TensorDescriptor, CudaDeviceVariable<Double>, ActivationDescriptor, TensorDescriptor, CudaDeviceVariable<Double>)
This function applies a bias and then an activation to the convolutions or crosscorrelations of cudnnConvolutionForward(), returning results in y.The full computation follows the equation y = act(alpha1* conv(x) + alpha2* z + bias ).
The routine cudnnGetConvolution2dForwardOutputDim or cudnnGetConvolutionNdForwardOutputDim can be used to determine the proper dimensions of the output tensor descriptor yDesc with respect to xDesc, convDesc and wDesc.
Declaration
public void ConvolutionBiasActivationForward(double alpha1, TensorDescriptor xDesc, CudaDeviceVariable<double> x, FilterDescriptor wDesc, CudaDeviceVariable<double> w, ConvolutionDescriptor convDesc, cudnnConvolutionFwdAlgo algo, CudaDeviceVariable<byte> workSpace, double alpha2, TensorDescriptor zDesc, CudaDeviceVariable<double> z, TensorDescriptor biasDesc, CudaDeviceVariable<double> bias, ActivationDescriptor activationDesc, TensorDescriptor yDesc, CudaDeviceVariable<double> y)
Parameters
Type | Name | Description |
---|---|---|
System.Double | alpha1 | Pointers to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as described by the above equation. |
TensorDescriptor | xDesc | Handle to a previously initialized tensor descriptor. |
CudaDeviceVariable<System.Double> | x | Data pointer to GPU memory associated with the tensor descriptor xDesc. |
FilterDescriptor | wDesc | Handle to a previously initialized filter descriptor. |
CudaDeviceVariable<System.Double> | w | Data pointer to GPU memory associated with the filter descriptor wDesc. |
ConvolutionDescriptor | convDesc | Previously initialized convolution descriptor. |
cudnnConvolutionFwdAlgo | algo | Enumerant that specifies which convolution algorithm shoud be used to compute the results |
CudaDeviceVariable<System.Byte> | workSpace | Data pointer to GPU memory to a workspace needed to able to execute the specified algorithm.If no workspace is needed for a particular algorithm, that pointer can be nil |
System.Double | alpha2 | Pointers to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as described by the above equation. |
TensorDescriptor | zDesc | Handle to a previously initialized tensor descriptor. |
CudaDeviceVariable<System.Double> | z | Data pointer to GPU memory associated with the tensor descriptor zDesc. |
TensorDescriptor | biasDesc | Handle to a previously initialized tensor descriptor. |
CudaDeviceVariable<System.Double> | bias | Data pointer to GPU memory associated with the tensor descriptor biasDesc. |
ActivationDescriptor | activationDesc | Handle to a previously initialized activation descriptor. |
TensorDescriptor | yDesc | Handle to a previously initialized tensor descriptor. |
CudaDeviceVariable<System.Double> | y | Data pointer to GPU memory associated with the tensor descriptor yDesc that carries the result of the convolution. |
ConvolutionBiasActivationForward(Single, TensorDescriptor, CudaDeviceVariable<Single>, FilterDescriptor, CudaDeviceVariable<Single>, ConvolutionDescriptor, cudnnConvolutionFwdAlgo, CudaDeviceVariable<Byte>, Single, TensorDescriptor, CudaDeviceVariable<Single>, TensorDescriptor, CudaDeviceVariable<Single>, ActivationDescriptor, TensorDescriptor, CudaDeviceVariable<Single>)
This function applies a bias and then an activation to the convolutions or crosscorrelations of cudnnConvolutionForward(), returning results in y.The full computation follows the equation y = act(alpha1* conv(x) + alpha2* z + bias ).
The routine cudnnGetConvolution2dForwardOutputDim or cudnnGetConvolutionNdForwardOutputDim can be used to determine the proper dimensions of the output tensor descriptor yDesc with respect to xDesc, convDesc and wDesc.
Declaration
public void ConvolutionBiasActivationForward(float alpha1, TensorDescriptor xDesc, CudaDeviceVariable<float> x, FilterDescriptor wDesc, CudaDeviceVariable<float> w, ConvolutionDescriptor convDesc, cudnnConvolutionFwdAlgo algo, CudaDeviceVariable<byte> workSpace, float alpha2, TensorDescriptor zDesc, CudaDeviceVariable<float> z, TensorDescriptor biasDesc, CudaDeviceVariable<float> bias, ActivationDescriptor activationDesc, TensorDescriptor yDesc, CudaDeviceVariable<float> y)
Parameters
Type | Name | Description |
---|---|---|
System.Single | alpha1 | Pointers to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as described by the above equation. |
TensorDescriptor | xDesc | Handle to a previously initialized tensor descriptor. |
CudaDeviceVariable<System.Single> | x | Data pointer to GPU memory associated with the tensor descriptor xDesc. |
FilterDescriptor | wDesc | Handle to a previously initialized filter descriptor. |
CudaDeviceVariable<System.Single> | w | Data pointer to GPU memory associated with the filter descriptor wDesc. |
ConvolutionDescriptor | convDesc | Previously initialized convolution descriptor. |
cudnnConvolutionFwdAlgo | algo | Enumerant that specifies which convolution algorithm shoud be used to compute the results |
CudaDeviceVariable<System.Byte> | workSpace | Data pointer to GPU memory to a workspace needed to able to execute the specified algorithm.If no workspace is needed for a particular algorithm, that pointer can be nil |
System.Single | alpha2 | Pointers to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as described by the above equation. |
TensorDescriptor | zDesc | Handle to a previously initialized tensor descriptor. |
CudaDeviceVariable<System.Single> | z | Data pointer to GPU memory associated with the tensor descriptor zDesc. |
TensorDescriptor | biasDesc | Handle to a previously initialized tensor descriptor. |
CudaDeviceVariable<System.Single> | bias | Data pointer to GPU memory associated with the tensor descriptor biasDesc. |
ActivationDescriptor | activationDesc | Handle to a previously initialized activation descriptor. |
TensorDescriptor | yDesc | Handle to a previously initialized tensor descriptor. |
CudaDeviceVariable<System.Single> | y | Data pointer to GPU memory associated with the tensor descriptor yDesc that carries the result of the convolution. |
ConvolutionForward(Double, TensorDescriptor, CudaDeviceVariable<Double>, FilterDescriptor, CudaDeviceVariable<Double>, ConvolutionDescriptor, cudnnConvolutionFwdAlgo, CudaDeviceVariable<Byte>, Double, TensorDescriptor, CudaDeviceVariable<Double>)
This function executes convolutions or cross-correlations over src using the specified filters, returning results in dest. Scaling factors alpha and beta can be used to scale the input tensor and the output tensor respectively.
Declaration
public void ConvolutionForward(double alpha, TensorDescriptor xDesc, CudaDeviceVariable<double> x, FilterDescriptor wDesc, CudaDeviceVariable<double> w, ConvolutionDescriptor convDesc, cudnnConvolutionFwdAlgo algo, CudaDeviceVariable<byte> workSpace, double beta, TensorDescriptor yDesc, CudaDeviceVariable<double> y)
Parameters
Type | Name | Description |
---|---|---|
System.Double | alpha | Pointer to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | xDesc | Handle to a previously initialized tensor descriptor. |
CudaDeviceVariable<System.Double> | x | Data pointer to GPU memory associated with the tensor descriptor srcDesc. |
FilterDescriptor | wDesc | Handle to a previously initialized filter descriptor. |
CudaDeviceVariable<System.Double> | w | Data pointer to GPU memory associated with the filter descriptor filterDesc. |
ConvolutionDescriptor | convDesc | Previously initialized convolution descriptor. |
cudnnConvolutionFwdAlgo | algo | Enumerant that specifies which convolution algorithm shoud be used to compute the results |
CudaDeviceVariable<System.Byte> | workSpace | Data pointer to GPU memory to a workspace needed to able to execute the specified algorithm. If no workspace is needed for a particular algorithm, that pointer can be nil |
System.Double | beta | Pointer to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | yDesc | Handle to a previously initialized tensor descriptor. |
CudaDeviceVariable<System.Double> | y | Data pointer to GPU memory associated with the tensor descriptor destDesc that carries the result of the convolution. |
ConvolutionForward(Single, TensorDescriptor, CudaDeviceVariable<Single>, FilterDescriptor, CudaDeviceVariable<Single>, ConvolutionDescriptor, cudnnConvolutionFwdAlgo, CudaDeviceVariable<Byte>, Single, TensorDescriptor, CudaDeviceVariable<Single>)
This function executes convolutions or cross-correlations over src using the specified filters, returning results in dest. Scaling factors alpha and beta can be used to scale the input tensor and the output tensor respectively.
Declaration
public void ConvolutionForward(float alpha, TensorDescriptor xDesc, CudaDeviceVariable<float> x, FilterDescriptor wDesc, CudaDeviceVariable<float> w, ConvolutionDescriptor convDesc, cudnnConvolutionFwdAlgo algo, CudaDeviceVariable<byte> workSpace, float beta, TensorDescriptor yDesc, CudaDeviceVariable<float> y)
Parameters
Type | Name | Description |
---|---|---|
System.Single | alpha | Pointer to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | xDesc | Handle to a previously initialized tensor descriptor. |
CudaDeviceVariable<System.Single> | x | Data pointer to GPU memory associated with the tensor descriptor srcDesc. |
FilterDescriptor | wDesc | Handle to a previously initialized filter descriptor. |
CudaDeviceVariable<System.Single> | w | Data pointer to GPU memory associated with the filter descriptor filterDesc. |
ConvolutionDescriptor | convDesc | Previously initialized convolution descriptor. |
cudnnConvolutionFwdAlgo | algo | Enumerant that specifies which convolution algorithm shoud be used to compute the results |
CudaDeviceVariable<System.Byte> | workSpace | Data pointer to GPU memory to a workspace needed to able to execute the specified algorithm. If no workspace is needed for a particular algorithm, that pointer can be nil |
System.Single | beta | Pointer to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | yDesc | Handle to a previously initialized tensor descriptor. |
CudaDeviceVariable<System.Single> | y | Data pointer to GPU memory associated with the tensor descriptor destDesc that carries the result of the convolution. |
DeriveBNTensorDescriptor(TensorDescriptor, TensorDescriptor, cudnnBatchNormMode)
Derives a tensor descriptor from layer data descriptor for BatchNormalization scale, invVariance, bnBias, bnScale tensors.Use this tensor desc for bnScaleBiasMeanVarDesc and bnScaleBiasDiffDesc in Batch Normalization forward and backward functions.
Declaration
public void DeriveBNTensorDescriptor(TensorDescriptor derivedBnDesc, TensorDescriptor xDesc, cudnnBatchNormMode mode)
Parameters
Type | Name | Description |
---|---|---|
TensorDescriptor | derivedBnDesc | |
TensorDescriptor | xDesc | |
cudnnBatchNormMode | mode |
Dispose()
Dispose
Declaration
public void Dispose()
Dispose(Boolean)
For IDisposable
Declaration
protected virtual void Dispose(bool fDisposing)
Parameters
Type | Name | Description |
---|---|---|
System.Boolean | fDisposing |
DropoutBackward(DropoutDescriptor, TensorDescriptor, CudaDeviceVariable<Double>, TensorDescriptor, CudaDeviceVariable<Double>, CudaDeviceVariable<Byte>)
This function performs backward dropout operation over dy returning results in dx. If during
forward dropout operation value from x was propagated to y then during backward operation value
from dy will be propagated to dx, otherwise, dx value will be set to 0.
Declaration
public void DropoutBackward(DropoutDescriptor dropoutDesc, TensorDescriptor dyDesc, CudaDeviceVariable<double> dy, TensorDescriptor dxDesc, CudaDeviceVariable<double> dx, CudaDeviceVariable<byte> reserveSpace)
Parameters
Type | Name | Description |
---|---|---|
DropoutDescriptor | dropoutDesc | Handle to a previously created dropout descriptor object. |
TensorDescriptor | dyDesc | Handle to a previously initialized tensor descriptor. |
CudaDeviceVariable<System.Double> | dy | Pointer to data of the tensor described by the dyDesc descriptor. |
TensorDescriptor | dxDesc | Handle to a previously initialized tensor descriptor. |
CudaDeviceVariable<System.Double> | dx | Pointer to data of the tensor described by the dxDesc descriptor. |
CudaDeviceVariable<System.Byte> | reserveSpace | Data pointer to GPU memory used by this function. It is expected that contents of reserveSpace doe not change between cudnnDropoutForward and cudnnDropoutBackward calls. |
DropoutBackward(DropoutDescriptor, TensorDescriptor, CudaDeviceVariable<Single>, TensorDescriptor, CudaDeviceVariable<Single>, CudaDeviceVariable<Byte>)
This function performs backward dropout operation over dy returning results in dx. If during
forward dropout operation value from x was propagated to y then during backward operation value
from dy will be propagated to dx, otherwise, dx value will be set to 0.
Declaration
public void DropoutBackward(DropoutDescriptor dropoutDesc, TensorDescriptor dyDesc, CudaDeviceVariable<float> dy, TensorDescriptor dxDesc, CudaDeviceVariable<float> dx, CudaDeviceVariable<byte> reserveSpace)
Parameters
Type | Name | Description |
---|---|---|
DropoutDescriptor | dropoutDesc | Handle to a previously created dropout descriptor object. |
TensorDescriptor | dyDesc | Handle to a previously initialized tensor descriptor. |
CudaDeviceVariable<System.Single> | dy | Pointer to data of the tensor described by the dyDesc descriptor. |
TensorDescriptor | dxDesc | Handle to a previously initialized tensor descriptor. |
CudaDeviceVariable<System.Single> | dx | Pointer to data of the tensor described by the dxDesc descriptor. |
CudaDeviceVariable<System.Byte> | reserveSpace | Data pointer to GPU memory used by this function. It is expected that contents of reserveSpace doe not change between cudnnDropoutForward and cudnnDropoutBackward calls. |
DropoutForward(DropoutDescriptor, TensorDescriptor, CudaDeviceVariable<Double>, TensorDescriptor, CudaDeviceVariable<Double>, CudaDeviceVariable<Byte>)
This function performs forward dropout operation over x returning results in y. If dropout was
used as a parameter to cudnnSetDropoutDescriptor, the approximately dropout fraction of x values
will be replaces by 0, and the rest will be scaled by 1/(1-dropout) This function should not be
running concurrently with another cudnnDropoutForward function using the same states.
Declaration
public void DropoutForward(DropoutDescriptor dropoutDesc, TensorDescriptor xDesc, CudaDeviceVariable<double> x, TensorDescriptor yDesc, CudaDeviceVariable<double> y, CudaDeviceVariable<byte> reserveSpace)
Parameters
Type | Name | Description |
---|---|---|
DropoutDescriptor | dropoutDesc | Handle to a previously created dropout descriptor object. |
TensorDescriptor | xDesc | Handle to the previously initialized input tensor descriptor. |
CudaDeviceVariable<System.Double> | x | Data pointer to GPU memory associated with the tensor descriptor srcDesc. |
TensorDescriptor | yDesc | Handle to the previously initialized output tensor descriptor. |
CudaDeviceVariable<System.Double> | y | Data pointer to GPU memory associated with the output tensor descriptor destDesc. |
CudaDeviceVariable<System.Byte> | reserveSpace | Data pointer to GPU memory used by this function. It is expected that contents of reserveSpace doe not change between cudnnDropoutForward and cudnnDropoutBackward calls. |
DropoutForward(DropoutDescriptor, TensorDescriptor, CudaDeviceVariable<Single>, TensorDescriptor, CudaDeviceVariable<Single>, CudaDeviceVariable<Byte>)
This function performs forward dropout operation over x returning results in y. If dropout was
used as a parameter to cudnnSetDropoutDescriptor, the approximately dropout fraction of x values
will be replaces by 0, and the rest will be scaled by 1/(1-dropout) This function should not be
running concurrently with another cudnnDropoutForward function using the same states.
Declaration
public void DropoutForward(DropoutDescriptor dropoutDesc, TensorDescriptor xDesc, CudaDeviceVariable<float> x, TensorDescriptor yDesc, CudaDeviceVariable<float> y, CudaDeviceVariable<byte> reserveSpace)
Parameters
Type | Name | Description |
---|---|---|
DropoutDescriptor | dropoutDesc | Handle to a previously created dropout descriptor object. |
TensorDescriptor | xDesc | Handle to the previously initialized input tensor descriptor. |
CudaDeviceVariable<System.Single> | x | Data pointer to GPU memory associated with the tensor descriptor srcDesc. |
TensorDescriptor | yDesc | Handle to the previously initialized output tensor descriptor. |
CudaDeviceVariable<System.Single> | y | Data pointer to GPU memory associated with the output tensor descriptor destDesc. |
CudaDeviceVariable<System.Byte> | reserveSpace | Data pointer to GPU memory used by this function. It is expected that contents of reserveSpace doe not change between cudnnDropoutForward and cudnnDropoutBackward calls. |
Finalize()
For dispose
Declaration
protected void Finalize()
FindConvolutionBackwardDataAlgorithm(FilterDescriptor, TensorDescriptor, ConvolutionDescriptor, TensorDescriptor, Int32)
This function attempts all cuDNN algorithms for cudnnConvolutionBackwardData_v3 and outputs performance metrics to a user- allocated array of cudnnConvolutionBwdDataAlgoPerf_t. These metrics are written in sorted fashion where the first element has the lowest compute time.
Declaration
public cudnnConvolutionBwdDataAlgoPerf[] FindConvolutionBackwardDataAlgorithm(FilterDescriptor wDesc, TensorDescriptor dyDesc, ConvolutionDescriptor convDesc, TensorDescriptor dxDesc, int requestedAlgoCount)
Parameters
Type | Name | Description |
---|---|---|
FilterDescriptor | wDesc | Handle to a previously initialized filter descriptor. |
TensorDescriptor | dyDesc | Handle to the previously initialized input differential tensor descriptor. |
ConvolutionDescriptor | convDesc | Previously initialized convolution descriptor. |
TensorDescriptor | dxDesc | Handle to the previously initialized output tensor descriptor. |
System.Int32 | requestedAlgoCount | The maximum number of elements to be stored in perfResults. |
Returns
Type | Description |
---|---|
cudnnConvolutionBwdDataAlgoPerf[] | An array to store performance metrics sorted ascending by compute time. |
FindConvolutionBackwardFilterAlgorithm(TensorDescriptor, TensorDescriptor, ConvolutionDescriptor, FilterDescriptor, Int32)
This function attempts all cuDNN algorithms for cudnnConvolutionBackwardFilter_v3 and outputs performance metrics to a user- allocated array of cudnnConvolutionBwdFilterAlgoPerf_t. These metrics are written in sorted fashion where the first element has the lowest compute time.
Declaration
public cudnnConvolutionBwdFilterAlgoPerf[] FindConvolutionBackwardFilterAlgorithm(TensorDescriptor xDesc, TensorDescriptor dyDesc, ConvolutionDescriptor convDesc, FilterDescriptor dwDesc, int requestedAlgoCount)
Parameters
Type | Name | Description |
---|---|---|
TensorDescriptor | xDesc | Handle to the previously initialized input tensor descriptor. |
TensorDescriptor | dyDesc | Handle to the previously initialized input differential tensor descriptor. |
ConvolutionDescriptor | convDesc | Previously initialized convolution descriptor. |
FilterDescriptor | dwDesc | Handle to a previously initialized filter descriptor. |
System.Int32 | requestedAlgoCount | The maximum number of elements to be stored in perfResults. |
Returns
Type | Description |
---|---|
cudnnConvolutionBwdFilterAlgoPerf[] | An array to store performance metrics sorted ascending by compute time. |
FindConvolutionForwardAlgorithm(TensorDescriptor, FilterDescriptor, ConvolutionDescriptor, TensorDescriptor, Int32)
This function attempts all cuDNN algorithms and outputs performance metrics to a user-allocated array of cudnnConvolutionFwdAlgoPerf_t. These metrics are written in sorted fashion where the first element has the lowest compute time.
Declaration
public cudnnConvolutionFwdAlgoPerf[] FindConvolutionForwardAlgorithm(TensorDescriptor srcDesc, FilterDescriptor filterDesc, ConvolutionDescriptor convDesc, TensorDescriptor destDesc, int requestedAlgoCount)
Parameters
Type | Name | Description |
---|---|---|
TensorDescriptor | srcDesc | Handle to the previously initialized input tensor descriptor. |
FilterDescriptor | filterDesc | Handle to a previously initialized filter descriptor. |
ConvolutionDescriptor | convDesc | Previously initialized convolution descriptor. |
TensorDescriptor | destDesc | Handle to the previously initialized output tensor descriptor. |
System.Int32 | requestedAlgoCount | The maximum number of elements to be stored in perfResults. |
Returns
Type | Description |
---|---|
cudnnConvolutionFwdAlgoPerf[] | An array to store performance metrics sorted ascending by compute time. |
GetConvolutionBackwardDataAlgorithm(FilterDescriptor, TensorDescriptor, ConvolutionDescriptor, TensorDescriptor, cudnnConvolutionBwdDataPreference, SizeT)
This function serves as a heuristic for obtaining the best suited algorithm for cudnnConvolutionBackwardData_v3 for the given layer specifications. Based on the input preference, this function will either return the fastest algorithm or the fastest algorithm within a given memory limit. For an exhaustive search for the fastest algorithm, please use cudnnFindConvolutionBackwardDataAlgorithm.
Declaration
public cudnnConvolutionBwdDataAlgo GetConvolutionBackwardDataAlgorithm(FilterDescriptor wDesc, TensorDescriptor dyDesc, ConvolutionDescriptor convDesc, TensorDescriptor dxDesc, cudnnConvolutionBwdDataPreference preference, SizeT memoryLimitInbytes)
Parameters
Type | Name | Description |
---|---|---|
FilterDescriptor | wDesc | Handle to a previously initialized filter descriptor. |
TensorDescriptor | dyDesc | Handle to the previously initialized input differential tensor descriptor. |
ConvolutionDescriptor | convDesc | Previously initialized convolution descriptor. |
TensorDescriptor | dxDesc | Handle to the previously initialized output tensor descriptor. |
cudnnConvolutionBwdDataPreference | preference | Enumerant to express the preference criteria in terms of memory requirement and speed. |
SizeT | memoryLimitInbytes | It is to specify the maximum amount of GPU memory the user is willing to use as a workspace. This is currently a placeholder and is not used. |
Returns
Type | Description |
---|---|
cudnnConvolutionBwdDataAlgo | Enumerant that specifies which convolution algorithm should be used to compute the results according to the specified preference |
GetConvolutionBackwardDataAlgorithm(FilterDescriptor, TensorDescriptor, ConvolutionDescriptor, TensorDescriptor, Int32)
This function serves as a heuristic for obtaining the best suited algorithm for cudnnConvolutionBackwardData for the given layer specifications.This function will return all algorithms sorted by expected (based on internal heuristic) relative performance with fastest being index 0 of perfResults.For an exhaustive search for the fastest algorithm, please use cudnnFindConvolutionBackwardDataAlgorithm.
Declaration
public cudnnConvolutionBwdDataAlgoPerf[] GetConvolutionBackwardDataAlgorithm(FilterDescriptor filterDesc, TensorDescriptor dyDesc, ConvolutionDescriptor convDesc, TensorDescriptor dxDesc, int requestedAlgoCount)
Parameters
Type | Name | Description |
---|---|---|
FilterDescriptor | filterDesc | Handle to a previously initialized filter descriptor. |
TensorDescriptor | dyDesc | Handle to the previously initialized input differential tensor descriptor. |
ConvolutionDescriptor | convDesc | Previously initialized convolution descriptor. |
TensorDescriptor | dxDesc | Handle to the previously initialized output tensor descriptor. |
System.Int32 | requestedAlgoCount | The maximum number of elements to be stored in perfResults. |
Returns
Type | Description |
---|---|
cudnnConvolutionBwdDataAlgoPerf[] | array to store performance metrics sorted ascending by compute time. |
GetConvolutionBackwardDataAlgorithmMaxCount()
Declaration
public int GetConvolutionBackwardDataAlgorithmMaxCount()
Returns
Type | Description |
---|---|
System.Int32 |
GetConvolutionBackwardDataWorkspaceSize(FilterDescriptor, TensorDescriptor, ConvolutionDescriptor, TensorDescriptor, cudnnConvolutionBwdDataAlgo)
This function returns the amount of GPU memory workspace the user needs to allocate to be able to call cudnnConvolutionBackwardData_v3 with the specified algorithm. The workspace allocated will then be passed to the routine cudnnConvolutionBackwardData_v3. The specified algorithm can be the result of the call to cudnnGetConvolutionBackwardDataAlgorithm or can be chosen arbitrarily by the user. Note that not every algorithm is available for every configuration of the input tensor and/or every configuration of the convolution descriptor.
Declaration
public SizeT GetConvolutionBackwardDataWorkspaceSize(FilterDescriptor wDesc, TensorDescriptor dyDesc, ConvolutionDescriptor convDesc, TensorDescriptor dxDesc, cudnnConvolutionBwdDataAlgo algo)
Parameters
Type | Name | Description |
---|---|---|
FilterDescriptor | wDesc | Handle to a previously initialized filter descriptor. |
TensorDescriptor | dyDesc | Handle to the previously initialized input differential tensor descriptor. |
ConvolutionDescriptor | convDesc | Previously initialized convolution descriptor. |
TensorDescriptor | dxDesc | Handle to the previously initialized output tensor descriptor. |
cudnnConvolutionBwdDataAlgo | algo | Enumerant that specifies the chosen convolution algorithm |
Returns
Type | Description |
---|---|
SizeT |
GetConvolutionBackwardFilterAlgorithm(TensorDescriptor, TensorDescriptor, ConvolutionDescriptor, FilterDescriptor, cudnnConvolutionBwdFilterPreference, SizeT)
This function serves as a heuristic for obtaining the best suited algorithm for cudnnConvolutionBackwardFilter_v3 for the given layer specifications. Based on the input preference, this function will either return the fastest algorithm or the fastest algorithm within a given memory limit. For an exhaustive search for the fastest algorithm, please use cudnnFindConvolutionBackwardFilterAlgorithm.
Declaration
public cudnnConvolutionBwdFilterAlgo GetConvolutionBackwardFilterAlgorithm(TensorDescriptor xDesc, TensorDescriptor dyDesc, ConvolutionDescriptor convDesc, FilterDescriptor dwDesc, cudnnConvolutionBwdFilterPreference preference, SizeT memoryLimitInbytes)
Parameters
Type | Name | Description |
---|---|---|
TensorDescriptor | xDesc | Handle to the previously initialized input tensor descriptor. |
TensorDescriptor | dyDesc | Handle to the previously initialized input differential tensor descriptor. |
ConvolutionDescriptor | convDesc | Previously initialized convolution descriptor. |
FilterDescriptor | dwDesc | Handle to a previously initialized filter descriptor. |
cudnnConvolutionBwdFilterPreference | preference | Enumerant to express the preference criteria in terms of memory requirement and speed. |
SizeT | memoryLimitInbytes | It is to specify the maximum amount of GPU memory the user is willing to use as a workspace. This is currently a placeholder and is not used. |
Returns
Type | Description |
---|---|
cudnnConvolutionBwdFilterAlgo | Enumerant that specifies which convolution algorithm should be used to compute the results according to the specified preference |
GetConvolutionBackwardFilterAlgorithm(TensorDescriptor, TensorDescriptor, FilterDescriptor, ConvolutionDescriptor, Int32)
This function serves as a heuristic for obtaining the best suited algorithm for cudnnConvolutionBackwardFilter for the given layer specifications.This function will return all algorithms sorted by expected (based on internal heuristic) relative performance with fastest being index 0 of perfResults.For an exhaustive search for the fastest algorithm, please use cudnnFindConvolutionBackwardFilterAlgorithm.
Declaration
public cudnnConvolutionBwdFilterAlgoPerf[] GetConvolutionBackwardFilterAlgorithm(TensorDescriptor xDesc, TensorDescriptor dyDesc, FilterDescriptor filterDesc, ConvolutionDescriptor convDesc, int requestedAlgoCount)
Parameters
Type | Name | Description |
---|---|---|
TensorDescriptor | xDesc | Handle to the previously initialized input tensor descriptor. |
TensorDescriptor | dyDesc | Handle to the previously initialized input differential tensor descriptor. |
FilterDescriptor | filterDesc | Handle to a previously initialized filter descriptor. |
ConvolutionDescriptor | convDesc | Previously initialized convolution descriptor. |
System.Int32 | requestedAlgoCount | The maximum number of elements to be stored in perfResults. |
Returns
Type | Description |
---|---|
cudnnConvolutionBwdFilterAlgoPerf[] | array to store performance metrics sorted ascending by compute time. |
GetConvolutionBackwardFilterAlgorithmMaxCount()
Declaration
public int GetConvolutionBackwardFilterAlgorithmMaxCount()
Returns
Type | Description |
---|---|
System.Int32 |
GetConvolutionBackwardFilterWorkspaceSize(TensorDescriptor, TensorDescriptor, ConvolutionDescriptor, FilterDescriptor, cudnnConvolutionBwdFilterAlgo)
This function returns the amount of GPU memory workspace the user needs to allocate to be able to call cudnnConvolutionBackwardFilter_v3 with the specified algorithm. The workspace allocated will then be passed to the routine cudnnConvolutionBackwardFilter_v3. The specified algorithm can be the result of the call to cudnnGetConvolutionBackwardFilterAlgorithm or can be chosen arbitrarily by the user. Note that not every algorithm is available for every configuration of the input tensor and/or every configuration of the convolution descriptor.
Declaration
public SizeT GetConvolutionBackwardFilterWorkspaceSize(TensorDescriptor xDesc, TensorDescriptor dyDesc, ConvolutionDescriptor convDesc, FilterDescriptor gradDesc, cudnnConvolutionBwdFilterAlgo algo)
Parameters
Type | Name | Description |
---|---|---|
TensorDescriptor | xDesc | Handle to the previously initialized input tensor descriptor. |
TensorDescriptor | dyDesc | Handle to the previously initialized input differential tensor descriptor. |
ConvolutionDescriptor | convDesc | Previously initialized convolution descriptor. |
FilterDescriptor | gradDesc | Handle to a previously initialized filter descriptor. |
cudnnConvolutionBwdFilterAlgo | algo | Enumerant that specifies the chosen convolution algorithm sizeInBytes output Amount of GPU memory needed as workspace to be able to execute |
Returns
Type | Description |
---|---|
SizeT | Amount of GPU memory needed as workspace to be able to execute a forward convolution with the specified algo |
GetConvolutionForwardAlgorithm(TensorDescriptor, FilterDescriptor, ConvolutionDescriptor, TensorDescriptor, cudnnConvolutionFwdPreference, SizeT)
This function serves as a heuristic for obtaining the best suited algorithm for cudnnConvolutionForward for the given layer specifications. Based on the input preference, this function will either return the fastest algorithm or the fastest algorithm within a given memory limit. For an exhaustive search for the fastest algorithm, please use cudnnFindConvolutionForwardAlgorithm.
Declaration
public cudnnConvolutionFwdAlgo GetConvolutionForwardAlgorithm(TensorDescriptor xDesc, FilterDescriptor filterDesc, ConvolutionDescriptor convDesc, TensorDescriptor yDesc, cudnnConvolutionFwdPreference preference, SizeT memoryLimitInbytes)
Parameters
Type | Name | Description |
---|---|---|
TensorDescriptor | xDesc | Handle to the previously initialized input tensor descriptor. |
FilterDescriptor | filterDesc | Handle to a previously initialized filter descriptor. |
ConvolutionDescriptor | convDesc | Previously initialized convolution descriptor. |
TensorDescriptor | yDesc | Handle to the previously initialized output tensor descriptor. |
cudnnConvolutionFwdPreference | preference | Enumerant to express the preference criteria in terms of memory requirement and speed. |
SizeT | memoryLimitInbytes | It is used when enumerant preference is set to CUDNN_CONVOLUTION_FWD_SPECIFY_WORKSPACE_LIMIT to specify the maximum amount of GPU memory the user is willing to use as a workspace |
Returns
Type | Description |
---|---|
cudnnConvolutionFwdAlgo | Enumerant that specifies which convolution algorithm should be used to compute the results according to the specified preference |
GetConvolutionForwardAlgorithm(TensorDescriptor, FilterDescriptor, ConvolutionDescriptor, TensorDescriptor, Int32)
This function serves as a heuristic for obtaining the best suited algorithm for cudnnConvolutionForward for the given layer specifications.This function will return all algorithms sorted by expected (based on internal heuristic) relative performance with fastest being index 0 of perfResults.For an exhaustive search for the fastest algorithm, please use cudnnFindConvolutionForwardAlgorithm.
Declaration
public cudnnConvolutionFwdAlgoPerf[] GetConvolutionForwardAlgorithm(TensorDescriptor xDesc, FilterDescriptor filterDesc, ConvolutionDescriptor convDesc, TensorDescriptor yDesc, int requestedAlgoCount)
Parameters
Type | Name | Description |
---|---|---|
TensorDescriptor | xDesc | Handle to the previously initialized input tensor descriptor. |
FilterDescriptor | filterDesc | Handle to a previously initialized filter descriptor. |
ConvolutionDescriptor | convDesc | Previously initialized convolution descriptor. |
TensorDescriptor | yDesc | Handle to the previously initialized output tensor descriptor. |
System.Int32 | requestedAlgoCount | The maximum number of elements to be stored in perfResults. |
Returns
Type | Description |
---|---|
cudnnConvolutionFwdAlgoPerf[] | array to store performance metrics sorted ascending by compute time. |
GetConvolutionForwardAlgorithmMaxCount()
Declaration
public int GetConvolutionForwardAlgorithmMaxCount()
Returns
Type | Description |
---|---|
System.Int32 |
GetConvolutionForwardWorkspaceSize(TensorDescriptor, FilterDescriptor, ConvolutionDescriptor, TensorDescriptor, cudnnConvolutionFwdAlgo)
This function returns the amount of GPU memory workspace the user needs to allocate to be able to call cudnnConvolutionForward with the specified algorithm. The workspace allocated will then be passed to the routine cudnnConvolutionForward. The specified algorithm can be the result of the call to cudnnGetConvolutionForwardAlgorithm or can be chosen arbitrarily by the user. Note that not every algorithm is available for every configuration of the input tensor and/or every configuration of the convolution descriptor.
Declaration
public SizeT GetConvolutionForwardWorkspaceSize(TensorDescriptor xDesc, FilterDescriptor filterDesc, ConvolutionDescriptor convDesc, TensorDescriptor yDesc, cudnnConvolutionFwdAlgo algo)
Parameters
Type | Name | Description |
---|---|---|
TensorDescriptor | xDesc | Handle to the previously initialized input tensor descriptor. |
FilterDescriptor | filterDesc | Handle to a previously initialized filter descriptor. |
ConvolutionDescriptor | convDesc | Previously initialized convolution descriptor. |
TensorDescriptor | yDesc | Handle to the previously initialized output tensor descriptor. |
cudnnConvolutionFwdAlgo | algo | Enumerant that specifies the chosen convolution algorithm |
Returns
Type | Description |
---|---|
SizeT |
GetDropoutDescriptor(DropoutDescriptor, ref Single, ref UInt64)
This function queries the fields of a previously initialized dropout descriptor.
Declaration
public CudaDeviceVariable<byte> GetDropoutDescriptor(DropoutDescriptor dropoutDesc, ref float droupout, ref ulong seed)
Parameters
Type | Name | Description |
---|---|---|
DropoutDescriptor | dropoutDesc | Previously initialized dropout descriptor. |
System.Single | droupout | The probability with which the value from input is set to 0 during the dropout layer. |
System.UInt64 | seed | Seed used to initialize random number generator states. |
Returns
Type | Description |
---|---|
CudaDeviceVariable<System.Byte> | user-allocated GPU memory that holds random number generator states. |
GetDropoutReserveSpaceSize(TensorDescriptor)
This function is used to query the amount of reserve needed to run dropout with the input dimensions given by xDesc.
The same reserve space is expected to be passed to cudnnDropoutForward and cudnnDropoutBackward, and its contents is
expected to remain unchanged between cudnnDropoutForward and cudnnDropoutBackward calls.
Declaration
public SizeT GetDropoutReserveSpaceSize(TensorDescriptor xDesc)
Parameters
Type | Name | Description |
---|---|---|
TensorDescriptor | xDesc | Handle to a previously initialized tensor descriptor, describing input to a dropout operation. |
Returns
Type | Description |
---|---|
SizeT |
GetDropoutStateSize()
This function is used to query the amount of space required to store the states of the random number generators used by cudnnDropoutForward function.
Declaration
public SizeT GetDropoutStateSize()
Returns
Type | Description |
---|---|
SizeT |
GetReductionIndicesSize(ReduceTensorDescriptor, TensorDescriptor, TensorDescriptor)
Helper function to return the minimum size of the index space to be passed to the reduction given the input and output tensors
Declaration
public SizeT GetReductionIndicesSize(ReduceTensorDescriptor reduceTensorDesc, TensorDescriptor aDesc, TensorDescriptor cDesc)
Parameters
Type | Name | Description |
---|---|---|
ReduceTensorDescriptor | reduceTensorDesc | |
TensorDescriptor | aDesc | |
TensorDescriptor | cDesc |
Returns
Type | Description |
---|---|
SizeT |
GetReductionWorkspaceSize(ReduceTensorDescriptor, TensorDescriptor, TensorDescriptor)
Helper function to return the minimum size of the workspace to be passed to the reduction given the input and output tensors
Declaration
public SizeT GetReductionWorkspaceSize(ReduceTensorDescriptor reduceTensorDesc, TensorDescriptor aDesc, TensorDescriptor cDesc)
Parameters
Type | Name | Description |
---|---|---|
ReduceTensorDescriptor | reduceTensorDesc | |
TensorDescriptor | aDesc | |
TensorDescriptor | cDesc |
Returns
Type | Description |
---|---|
SizeT |
GetStream()
This function gets the stream to be used by the cudnn library to execute its routines.
Declaration
public CudaStream GetStream()
Returns
Type | Description |
---|---|
CudaStream |
Im2Col(TensorDescriptor, CudaDeviceVariable<Double>, FilterDescriptor, ConvolutionDescriptor, CudaDeviceVariable<Byte>)
Declaration
public void Im2Col(TensorDescriptor srcDesc, CudaDeviceVariable<double> srcData, FilterDescriptor filterDesc, ConvolutionDescriptor convDesc, CudaDeviceVariable<byte> colBuffer)
Parameters
Type | Name | Description |
---|---|---|
TensorDescriptor | srcDesc | |
CudaDeviceVariable<System.Double> | srcData | |
FilterDescriptor | filterDesc | |
ConvolutionDescriptor | convDesc | |
CudaDeviceVariable<System.Byte> | colBuffer |
Im2Col(TensorDescriptor, CudaDeviceVariable<Single>, FilterDescriptor, ConvolutionDescriptor, CudaDeviceVariable<Byte>)
Declaration
public void Im2Col(TensorDescriptor srcDesc, CudaDeviceVariable<float> srcData, FilterDescriptor filterDesc, ConvolutionDescriptor convDesc, CudaDeviceVariable<byte> colBuffer)
Parameters
Type | Name | Description |
---|---|---|
TensorDescriptor | srcDesc | |
CudaDeviceVariable<System.Single> | srcData | |
FilterDescriptor | filterDesc | |
ConvolutionDescriptor | convDesc | |
CudaDeviceVariable<System.Byte> | colBuffer |
OpTensor(OpTensorDescriptor, Double, TensorDescriptor, CudaDeviceVariable<Double>, Double, TensorDescriptor, CudaDeviceVariable<Double>, Double, TensorDescriptor, CudaDeviceVariable<Double>)
This function implements the equation C = op(alpha1[0] * A, alpha2[0] * B) + beta[0] * C, given tensors A, B, and C and scaling factors alpha1, alpha2, and beta.The op to use is indicated by the descriptor opTensorDesc.Currently-supported ops are listed by the cudnnOpTensorOp_t enum. Each dimension of the input tensor A must match the corresponding dimension of the destination tensor C, and each dimension of the input tensor B must match the corresponding dimension of the destination tensor C or must be equal to 1. In the latter case, the same value from the input tensor B for those dimensions will be used to blend into the C tensor.The data types of the input tensors A and B must match. If the data type of the destination tensor C is double, then the data type of the input tensors also must be double. If the data type of the destination tensor C is double, then opTensorCompType in opTensorDesc must be double. Else opTensorCompType must be float. If the input tensor B is the same tensor as the destination tensor C, then the input tensor A also must be the same tensor as the destination tensor C.
Declaration
public void OpTensor(OpTensorDescriptor op_desc, double alpha1, TensorDescriptor a_desc, CudaDeviceVariable<double> a, double alpha2, TensorDescriptor b_desc, CudaDeviceVariable<double> b, double beta, TensorDescriptor c_desc, CudaDeviceVariable<double> c)
Parameters
Type | Name | Description |
---|---|---|
OpTensorDescriptor | op_desc | Handle to a previously initialized op tensor descriptor. |
System.Double | alpha1 | Pointer to the scaling factor(in host memory) used to blend the source value with prior value in the destination tensor as indicated by the above op equation. |
TensorDescriptor | a_desc | Handle to a previously initialized tensor descriptor. |
CudaDeviceVariable<System.Double> | a | Pointer to data of the tensor described by the a_desc. |
System.Double | alpha2 | Pointer to the scaling factor(in host memory) used to blend the source value with prior value in the destination tensor as indicated by the above op equation. |
TensorDescriptor | b_desc | Handle to a previously initialized tensor descriptor. |
CudaDeviceVariable<System.Double> | b | Pointer to data of the tensor described by the b_desc. |
System.Double | beta | Pointer to the scaling factor(in host memory) used to blend the source value with prior value in the destination tensor as indicated by the above op equation. |
TensorDescriptor | c_desc | Handle to a previously initialized tensor descriptor. |
CudaDeviceVariable<System.Double> | c | Output pointer to data of the tensor described by the c_desc. |
OpTensor(OpTensorDescriptor, Single, TensorDescriptor, CudaDeviceVariable<Single>, Single, TensorDescriptor, CudaDeviceVariable<Single>, Single, TensorDescriptor, CudaDeviceVariable<Single>)
This function implements the equation C = op(alpha1[0] * A, alpha2[0] * B) + beta[0] * C, given tensors A, B, and C and scaling factors alpha1, alpha2, and beta.The op to use is indicated by the descriptor opTensorDesc.Currently-supported ops are listed by the cudnnOpTensorOp_t enum. Each dimension of the input tensor A must match the corresponding dimension of the destination tensor C, and each dimension of the input tensor B must match the corresponding dimension of the destination tensor C or must be equal to 1. In the latter case, the same value from the input tensor B for those dimensions will be used to blend into the C tensor.The data types of the input tensors A and B must match. If the data type of the destination tensor C is double, then the data type of the input tensors also must be double. If the data type of the destination tensor C is double, then opTensorCompType in opTensorDesc must be double. Else opTensorCompType must be float. If the input tensor B is the same tensor as the destination tensor C, then the input tensor A also must be the same tensor as the destination tensor C.
Declaration
public void OpTensor(OpTensorDescriptor op_desc, float alpha1, TensorDescriptor a_desc, CudaDeviceVariable<float> a, float alpha2, TensorDescriptor b_desc, CudaDeviceVariable<float> b, float beta, TensorDescriptor c_desc, CudaDeviceVariable<float> c)
Parameters
Type | Name | Description |
---|---|---|
OpTensorDescriptor | op_desc | Handle to a previously initialized op tensor descriptor. |
System.Single | alpha1 | Pointer to the scaling factor(in host memory) used to blend the source value with prior value in the destination tensor as indicated by the above op equation. |
TensorDescriptor | a_desc | Handle to a previously initialized tensor descriptor. |
CudaDeviceVariable<System.Single> | a | Pointer to data of the tensor described by the a_desc. |
System.Single | alpha2 | Pointer to the scaling factor(in host memory) used to blend the source value with prior value in the destination tensor as indicated by the above op equation. |
TensorDescriptor | b_desc | Handle to a previously initialized tensor descriptor. |
CudaDeviceVariable<System.Single> | b | Pointer to data of the tensor described by the b_desc. |
System.Single | beta | Pointer to the scaling factor(in host memory) used to blend the source value with prior value in the destination tensor as indicated by the above op equation. |
TensorDescriptor | c_desc | Handle to a previously initialized tensor descriptor. |
CudaDeviceVariable<System.Single> | c | Output pointer to data of the tensor described by the c_desc. |
PoolingBackward(PoolingDescriptor, Double, TensorDescriptor, CudaDeviceVariable<Double>, TensorDescriptor, CudaDeviceVariable<Double>, TensorDescriptor, CudaDeviceVariable<Double>, Double, TensorDescriptor, CudaDeviceVariable<Double>)
This function computes the gradient of a pooling operation.
Declaration
public void PoolingBackward(PoolingDescriptor poolingDesc, double alpha, TensorDescriptor yDesc, CudaDeviceVariable<double> y, TensorDescriptor dyDesc, CudaDeviceVariable<double> dy, TensorDescriptor xDesc, CudaDeviceVariable<double> x, double beta, TensorDescriptor dxDesc, CudaDeviceVariable<double> dx)
Parameters
Type | Name | Description |
---|---|---|
PoolingDescriptor | poolingDesc | Handle to the previously initialized pooling descriptor. |
System.Double | alpha | Pointer to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | yDesc | Handle to the previously initialized input tensor descriptor. |
CudaDeviceVariable<System.Double> | y | Data pointer to GPU memory associated with the tensor descriptor srcDesc. |
TensorDescriptor | dyDesc | Handle to the previously initialized input differential tensor descriptor. |
CudaDeviceVariable<System.Double> | dy | Data pointer to GPU memory associated with the tensor descriptor srcDiffData. |
TensorDescriptor | xDesc | Handle to the previously initialized output tensor descriptor. |
CudaDeviceVariable<System.Double> | x | Data pointer to GPU memory associated with the output tensor descriptor destDesc. |
System.Double | beta | Pointer to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | dxDesc | Handle to the previously initialized output differential tensor descriptor. |
CudaDeviceVariable<System.Double> | dx | Data pointer to GPU memory associated with the output tensor descriptor destDiffDesc. |
PoolingBackward(PoolingDescriptor, Single, TensorDescriptor, CudaDeviceVariable<Single>, TensorDescriptor, CudaDeviceVariable<Single>, TensorDescriptor, CudaDeviceVariable<Single>, Single, TensorDescriptor, CudaDeviceVariable<Single>)
This function computes the gradient of a pooling operation.
Declaration
public void PoolingBackward(PoolingDescriptor poolingDesc, float alpha, TensorDescriptor yDesc, CudaDeviceVariable<float> y, TensorDescriptor dyDesc, CudaDeviceVariable<float> dy, TensorDescriptor xDesc, CudaDeviceVariable<float> x, float beta, TensorDescriptor dxDesc, CudaDeviceVariable<float> dx)
Parameters
Type | Name | Description |
---|---|---|
PoolingDescriptor | poolingDesc | Handle to the previously initialized pooling descriptor. |
System.Single | alpha | Pointer to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | yDesc | Handle to the previously initialized input tensor descriptor. |
CudaDeviceVariable<System.Single> | y | Data pointer to GPU memory associated with the tensor descriptor srcDesc. |
TensorDescriptor | dyDesc | Handle to the previously initialized input differential tensor descriptor. |
CudaDeviceVariable<System.Single> | dy | Data pointer to GPU memory associated with the tensor descriptor srcDiffData. |
TensorDescriptor | xDesc | Handle to the previously initialized output tensor descriptor. |
CudaDeviceVariable<System.Single> | x | Data pointer to GPU memory associated with the output tensor descriptor destDesc. |
System.Single | beta | Pointer to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | dxDesc | Handle to the previously initialized output differential tensor descriptor. |
CudaDeviceVariable<System.Single> | dx | Data pointer to GPU memory associated with the output tensor descriptor destDiffDesc. |
PoolingForward(PoolingDescriptor, Double, TensorDescriptor, CudaDeviceVariable<Double>, Double, TensorDescriptor, CudaDeviceVariable<Double>)
This function computes pooling of input values (i.e., the maximum or average of several adjacent values) to produce an output with smaller height and/or width.
Declaration
public void PoolingForward(PoolingDescriptor poolingDesc, double alpha, TensorDescriptor xDesc, CudaDeviceVariable<double> x, double beta, TensorDescriptor yDesc, CudaDeviceVariable<double> y)
Parameters
Type | Name | Description |
---|---|---|
PoolingDescriptor | poolingDesc | Handle to a previously initialized pooling descriptor. |
System.Double | alpha | Pointer to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | xDesc | Handle to the previously initialized input tensor descriptor. |
CudaDeviceVariable<System.Double> | x | Data pointer to GPU memory associated with the tensor descriptor srcDesc. |
System.Double | beta | Pointer to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | yDesc | Handle to the previously initialized output tensor descriptor. |
CudaDeviceVariable<System.Double> | y | Data pointer to GPU memory associated with the output tensor descriptor destDesc. |
PoolingForward(PoolingDescriptor, Single, TensorDescriptor, CudaDeviceVariable<Single>, Single, TensorDescriptor, CudaDeviceVariable<Single>)
This function computes pooling of input values (i.e., the maximum or average of several adjacent values) to produce an output with smaller height and/or width.
Declaration
public void PoolingForward(PoolingDescriptor poolingDesc, float alpha, TensorDescriptor xDesc, CudaDeviceVariable<float> x, float beta, TensorDescriptor yDesc, CudaDeviceVariable<float> y)
Parameters
Type | Name | Description |
---|---|---|
PoolingDescriptor | poolingDesc | Handle to a previously initialized pooling descriptor. |
System.Single | alpha | Pointer to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | xDesc | Handle to the previously initialized input tensor descriptor. |
CudaDeviceVariable<System.Single> | x | Data pointer to GPU memory associated with the tensor descriptor srcDesc. |
System.Single | beta | Pointer to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | yDesc | Handle to the previously initialized output tensor descriptor. |
CudaDeviceVariable<System.Single> | y | Data pointer to GPU memory associated with the output tensor descriptor destDesc. |
QueryRuntimeError(cudnnErrQueryMode)
cuDNN library functions perform extensive input argument checking before launching GPU kernels.The last step is to verify that the GPU kernel actually started. When a kernel fails to start, CUDNN_STATUS_EXECUTION_FAILED is returned by the corresponding API call. Typically, after a GPU kernel starts, no runtime checks are performed by the kernel itself -- numerical results are simply written to output buffers.
When the CUDNN_BATCHNORM_SPATIAL_PERSISTENT mode is selected in cudnnBatchNormalizationForwardTraining or cudnnBatchNormalizationBackward, the algorithm may encounter numerical overflows where CUDNN_BATCHNORM_SPATIAL performs just fine albeit at a slower speed.
The user can invoke cudnnQueryRuntimeError to make sure numerical overflows did not occur during the kernel execution.Those issues are reported by the kernel that performs computations.
Declaration
public cudnnStatus QueryRuntimeError(cudnnErrQueryMode mode)
Parameters
Type | Name | Description |
---|---|---|
cudnnErrQueryMode | mode | Remote error query mode. |
Returns
Type | Description |
---|---|
cudnnStatus | the user's error code |
ReduceTensor(ReduceTensorDescriptor, CudaDeviceVariable<UInt32>, CudaDeviceVariable<Byte>, SizeT, Double, TensorDescriptor, CudaDeviceVariable<Double>, Double, TensorDescriptor, CudaDeviceVariable<Double>)
This function reduces tensor A by implementing the equation C = alpha * reduce op ( A )
- beta* C, given tensors A and C and scaling factors alpha and beta.The reduction op to use is indicated by the descriptor reduceTensorDesc.Currently-supported ops are listed by the cudnnReduceTensorOp_t enum.
Declaration
public void ReduceTensor(ReduceTensorDescriptor reduceTensorDesc, CudaDeviceVariable<uint> indices, CudaDeviceVariable<byte> workspace, SizeT workspaceSizeInBytes, double alpha, TensorDescriptor aDesc, CudaDeviceVariable<double> A, double beta, TensorDescriptor cDesc, CudaDeviceVariable<double> C)
Parameters
Type | Name | Description |
---|---|---|
ReduceTensorDescriptor | reduceTensorDesc | Handle to a previously initialized reduce tensor descriptor. |
CudaDeviceVariable<System.UInt32> | indices | Handle to a previously allocated space for writing indices. |
CudaDeviceVariable<System.Byte> | workspace | Handle to a previously allocated space for the reduction implementation. |
SizeT | workspaceSizeInBytes | Size of the above previously allocated space. |
System.Double | alpha | Pointer to scaling factor (in host memory) used to blend the source value with prior value in the destination tensor as indicated by the above op equation. |
TensorDescriptor | aDesc | Handle to a previously initialized tensor descriptor. |
CudaDeviceVariable<System.Double> | A | Pointer to data of the tensor described by the aDesc descriptor. |
System.Double | beta | Pointer to scaling factor (in host memory) used to blend the source value with prior value in the destination tensor as indicated by the above op equation. |
TensorDescriptor | cDesc | Handle to a previously initialized tensor descriptor. |
CudaDeviceVariable<System.Double> | C | Pointer to data of the tensor described by the cDesc descriptor. |
ReduceTensor(ReduceTensorDescriptor, CudaDeviceVariable<UInt32>, CudaDeviceVariable<Byte>, SizeT, Single, TensorDescriptor, CudaDeviceVariable<Single>, Single, TensorDescriptor, CudaDeviceVariable<Single>)
This function reduces tensor A by implementing the equation C = alpha * reduce op ( A )
- beta* C, given tensors A and C and scaling factors alpha and beta.The reduction op to use is indicated by the descriptor reduceTensorDesc.Currently-supported ops are listed by the cudnnReduceTensorOp_t enum.
Declaration
public void ReduceTensor(ReduceTensorDescriptor reduceTensorDesc, CudaDeviceVariable<uint> indices, CudaDeviceVariable<byte> workspace, SizeT workspaceSizeInBytes, float alpha, TensorDescriptor aDesc, CudaDeviceVariable<float> A, float beta, TensorDescriptor cDesc, CudaDeviceVariable<float> C)
Parameters
Type | Name | Description |
---|---|---|
ReduceTensorDescriptor | reduceTensorDesc | Handle to a previously initialized reduce tensor descriptor. |
CudaDeviceVariable<System.UInt32> | indices | Handle to a previously allocated space for writing indices. |
CudaDeviceVariable<System.Byte> | workspace | Handle to a previously allocated space for the reduction implementation. |
SizeT | workspaceSizeInBytes | Size of the above previously allocated space. |
System.Single | alpha | Pointer to scaling factor (in host memory) used to blend the source value with prior value in the destination tensor as indicated by the above op equation. |
TensorDescriptor | aDesc | Handle to a previously initialized tensor descriptor. |
CudaDeviceVariable<System.Single> | A | Pointer to data of the tensor described by the aDesc descriptor. |
System.Single | beta | Pointer to scaling factor (in host memory) used to blend the source value with prior value in the destination tensor as indicated by the above op equation. |
TensorDescriptor | cDesc | Handle to a previously initialized tensor descriptor. |
CudaDeviceVariable<System.Single> | C | Pointer to data of the tensor described by the cDesc descriptor. |
RestoreDropoutDescriptor(DropoutDescriptor, CudaDeviceVariable<Byte>, ref Single, ref UInt64)
This function restores a dropout descriptor to a previously saved-off state.
Declaration
public void RestoreDropoutDescriptor(DropoutDescriptor dropoutDesc, CudaDeviceVariable<byte> states, ref float droupout, ref ulong seed)
Parameters
Type | Name | Description |
---|---|---|
DropoutDescriptor | dropoutDesc | Previously created dropout descriptor. |
CudaDeviceVariable<System.Byte> | states | Pointer to GPU memory that holds random number generator states initialized by a prior call to cudnnSetDropoutDescriptor. |
System.Single | droupout | Probability with which the value from an input tensor is set to 0 when performing dropout. |
System.UInt64 | seed | Seed used in prior call to cudnnSetDropoutDescriptor that initialized #states' buffer. Using a different seed from this has no effect. A change of seed, and subsequent update to random number generator states can be achieved by calling cudnnSetDropoutDescriptor. |
ScaleTensor(TensorDescriptor, CudaDeviceVariable<Double>, Double)
This function scale all the elements of a tensor by a give factor.
Declaration
public void ScaleTensor(TensorDescriptor yDesc, CudaDeviceVariable<double> y, double alpha)
Parameters
Type | Name | Description |
---|---|---|
TensorDescriptor | yDesc | Handle to a previously initialized tensor descriptor. |
CudaDeviceVariable<System.Double> | y | Pointer to data of the tensor described by the srcDestDesc descriptor. |
System.Double | alpha | Pointer in Host memory to a value that all elements of the tensor will be scaled with. |
ScaleTensor(TensorDescriptor, CudaDeviceVariable<Single>, Single)
This function scale all the elements of a tensor by a give factor.
Declaration
public void ScaleTensor(TensorDescriptor yDesc, CudaDeviceVariable<float> y, float alpha)
Parameters
Type | Name | Description |
---|---|---|
TensorDescriptor | yDesc | Handle to a previously initialized tensor descriptor. |
CudaDeviceVariable<System.Single> | y | Pointer to data of the tensor described by the srcDestDesc descriptor. |
System.Single | alpha | Pointer in Host memory to a value that all elements of the tensor will be scaled with. |
SetRNNDescriptor(RNNDescriptor, Int32, Int32, DropoutDescriptor, cudnnRNNInputMode, cudnnDirectionMode, cudnnRNNMode, cudnnRNNAlgo, cudnnDataType)
This function initializes a previously created RNN descriptor object.
Declaration
public void SetRNNDescriptor(RNNDescriptor rnnDesc, int hiddenSize, int numLayers, DropoutDescriptor dropoutDesc, cudnnRNNInputMode inputMode, cudnnDirectionMode direction, cudnnRNNMode mode, cudnnRNNAlgo algo, cudnnDataType dataType)
Parameters
Type | Name | Description |
---|---|---|
RNNDescriptor | rnnDesc | A previously created RNN descriptor. |
System.Int32 | hiddenSize | Size of the internal hidden state for each layer. |
System.Int32 | numLayers | Number of stacked layers. |
DropoutDescriptor | dropoutDesc | Handle to a previously created and initialized dropout descriptor. Dropout will be applied between layers(eg.a single layer network will have no dropout applied). |
cudnnRNNInputMode | inputMode | Specifies the behavior at the input to the first layer |
cudnnDirectionMode | direction | Specifies the recurrence pattern. (eg. bidirectional) |
cudnnRNNMode | mode | Specifies the type of RNN to compute. |
cudnnRNNAlgo | algo | Specifies which RNN algorithm should be used to compute the results. |
cudnnDataType | dataType | Compute precision. |
SetStream(CudaStream)
This function sets the stream to be used by the cudnn library to execute its routines.
Declaration
public void SetStream(CudaStream stream)
Parameters
Type | Name | Description |
---|---|---|
CudaStream | stream | the stream to be used by the library. |
SetTensor(TensorDescriptor, CudaDeviceVariable<Double>, Double)
This function sets all the elements of a tensor to a given value
Declaration
public void SetTensor(TensorDescriptor yDesc, CudaDeviceVariable<double> y, double value)
Parameters
Type | Name | Description |
---|---|---|
TensorDescriptor | yDesc | Handle to a previously initialized tensor descriptor. |
CudaDeviceVariable<System.Double> | y | Pointer to data of the tensor described by the srcDestDesc descriptor. |
System.Double | value | Pointer in Host memory to a value that all elements of the tensor will be set to. |
SetTensor(TensorDescriptor, CudaDeviceVariable<Single>, Single)
This function sets all the elements of a tensor to a given value
Declaration
public void SetTensor(TensorDescriptor yDesc, CudaDeviceVariable<float> y, float value)
Parameters
Type | Name | Description |
---|---|---|
TensorDescriptor | yDesc | Handle to a previously initialized tensor descriptor. |
CudaDeviceVariable<System.Single> | y | Pointer to data of the tensor described by the srcDestDesc descriptor. |
System.Single | value | Pointer in Host memory to a value that all elements of the tensor will be set to. |
SoftmaxBackward(cudnnSoftmaxAlgorithm, cudnnSoftmaxMode, Double, TensorDescriptor, CudaDeviceVariable<Double>, TensorDescriptor, CudaDeviceVariable<Double>, Double, TensorDescriptor, CudaDeviceVariable<Double>)
This routine computes the gradient of the softmax function.
Declaration
public void SoftmaxBackward(cudnnSoftmaxAlgorithm algorithm, cudnnSoftmaxMode mode, double alpha, TensorDescriptor yDesc, CudaDeviceVariable<double> y, TensorDescriptor dyDesc, CudaDeviceVariable<double> dy, double beta, TensorDescriptor dxDesc, CudaDeviceVariable<double> dx)
Parameters
Type | Name | Description |
---|---|---|
cudnnSoftmaxAlgorithm | algorithm | Enumerant to specify the softmax algorithm. |
cudnnSoftmaxMode | mode | Enumerant to specify the softmax mode. |
System.Double | alpha | Pointer to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | yDesc | Handle to the previously initialized input tensor descriptor. |
CudaDeviceVariable<System.Double> | y | Data pointer to GPU memory associated with the tensor descriptor srcDesc. |
TensorDescriptor | dyDesc | Handle to the previously initialized input differential tensor descriptor. |
CudaDeviceVariable<System.Double> | dy | Data pointer to GPU memory associated with the tensor descriptor srcDiffData. |
System.Double | beta | Pointer to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | dxDesc | Handle to the previously initialized output differential tensor descriptor. |
CudaDeviceVariable<System.Double> | dx | Data pointer to GPU memory associated with the output tensor descriptor destDiffDesc. |
SoftmaxBackward(cudnnSoftmaxAlgorithm, cudnnSoftmaxMode, Single, TensorDescriptor, CudaDeviceVariable<Single>, TensorDescriptor, CudaDeviceVariable<Single>, Single, TensorDescriptor, CudaDeviceVariable<Single>)
This routine computes the gradient of the softmax function.
Declaration
public void SoftmaxBackward(cudnnSoftmaxAlgorithm algorithm, cudnnSoftmaxMode mode, float alpha, TensorDescriptor yDesc, CudaDeviceVariable<float> y, TensorDescriptor dyDesc, CudaDeviceVariable<float> dy, float beta, TensorDescriptor dxDesc, CudaDeviceVariable<float> dx)
Parameters
Type | Name | Description |
---|---|---|
cudnnSoftmaxAlgorithm | algorithm | Enumerant to specify the softmax algorithm. |
cudnnSoftmaxMode | mode | Enumerant to specify the softmax mode. |
System.Single | alpha | Pointer to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | yDesc | Handle to the previously initialized input tensor descriptor. |
CudaDeviceVariable<System.Single> | y | Data pointer to GPU memory associated with the tensor descriptor srcDesc. |
TensorDescriptor | dyDesc | Handle to the previously initialized input differential tensor descriptor. |
CudaDeviceVariable<System.Single> | dy | Data pointer to GPU memory associated with the tensor descriptor srcDiffData. |
System.Single | beta | Pointer to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | dxDesc | Handle to the previously initialized output differential tensor descriptor. |
CudaDeviceVariable<System.Single> | dx | Data pointer to GPU memory associated with the output tensor descriptor destDiffDesc. |
SoftmaxForward(cudnnSoftmaxAlgorithm, cudnnSoftmaxMode, Double, TensorDescriptor, CudaDeviceVariable<Double>, Double, TensorDescriptor, CudaDeviceVariable<Double>)
This routine computes the softmax function.
Declaration
public void SoftmaxForward(cudnnSoftmaxAlgorithm algorithm, cudnnSoftmaxMode mode, double alpha, TensorDescriptor xDesc, CudaDeviceVariable<double> x, double beta, TensorDescriptor yDesc, CudaDeviceVariable<double> y)
Parameters
Type | Name | Description |
---|---|---|
cudnnSoftmaxAlgorithm | algorithm | Enumerant to specify the softmax algorithm. |
cudnnSoftmaxMode | mode | Enumerant to specify the softmax mode. |
System.Double | alpha | Pointer to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | xDesc | Handle to the previously initialized input tensor descriptor. |
CudaDeviceVariable<System.Double> | x | Data pointer to GPU memory associated with the tensor descriptor srcDesc. |
System.Double | beta | Pointer to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | yDesc | Handle to the previously initialized output tensor descriptor. |
CudaDeviceVariable<System.Double> | y | Data pointer to GPU memory associated with the output tensor descriptor destDesc. |
SoftmaxForward(cudnnSoftmaxAlgorithm, cudnnSoftmaxMode, Single, TensorDescriptor, CudaDeviceVariable<Single>, Single, TensorDescriptor, CudaDeviceVariable<Single>)
This routine computes the softmax function.
Declaration
public void SoftmaxForward(cudnnSoftmaxAlgorithm algorithm, cudnnSoftmaxMode mode, float alpha, TensorDescriptor xDesc, CudaDeviceVariable<float> x, float beta, TensorDescriptor yDesc, CudaDeviceVariable<float> y)
Parameters
Type | Name | Description |
---|---|---|
cudnnSoftmaxAlgorithm | algorithm | Enumerant to specify the softmax algorithm. |
cudnnSoftmaxMode | mode | Enumerant to specify the softmax mode. |
System.Single | alpha | Pointer to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | xDesc | Handle to the previously initialized input tensor descriptor. |
CudaDeviceVariable<System.Single> | x | Data pointer to GPU memory associated with the tensor descriptor srcDesc. |
System.Single | beta | Pointer to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | yDesc | Handle to the previously initialized output tensor descriptor. |
CudaDeviceVariable<System.Single> | y | Data pointer to GPU memory associated with the output tensor descriptor destDesc. |
TransformTensor(Double, TensorDescriptor, CudaDeviceVariable<Double>, Double, TensorDescriptor, CudaDeviceVariable<Double>)
This function copies the scaled data from one tensor to another tensor with a different layout. Those descriptors need to have the same dimensions but not necessarily the same strides. The input and output tensors must not overlap in any way (i.e., tensors cannot be transformed in place). This function can be used to convert a tensor with an unsupported format to a supported one.
Declaration
public void TransformTensor(double alpha, TensorDescriptor xDesc, CudaDeviceVariable<double> x, double beta, TensorDescriptor yDesc, CudaDeviceVariable<double> y)
Parameters
Type | Name | Description |
---|---|---|
System.Double | alpha | Pointer to scaling factors (in host memory) used to blend the source value with prior value in the destination tensor as follows: dstValue = alpha[0]*srcValue + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | xDesc | Handle to a previously initialized tensor descriptor. |
CudaDeviceVariable<System.Double> | x | Pointer to data of the tensor described by the srcDesc descriptor. |
System.Double | beta | Pointer to scaling factors (in host memory) used to blend the source value with prior value in the destination tensor as follows: dstValue = alpha[0]*srcValue + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | yDesc | Handle to a previously initialized tensor descriptor. |
CudaDeviceVariable<System.Double> | y | Pointer to data of the tensor described by the destDesc descriptor. |
TransformTensor(Single, TensorDescriptor, CudaDeviceVariable<Single>, Single, TensorDescriptor, CudaDeviceVariable<Single>)
This function copies the scaled data from one tensor to another tensor with a different layout. Those descriptors need to have the same dimensions but not necessarily the same strides. The input and output tensors must not overlap in any way (i.e., tensors cannot be transformed in place). This function can be used to convert a tensor with an unsupported format to a supported one.
Declaration
public void TransformTensor(float alpha, TensorDescriptor xDesc, CudaDeviceVariable<float> x, float beta, TensorDescriptor yDesc, CudaDeviceVariable<float> y)
Parameters
Type | Name | Description |
---|---|---|
System.Single | alpha | Pointer to scaling factors (in host memory) used to blend the source value with prior value in the destination tensor as follows: dstValue = alpha[0]*srcValue + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | xDesc | Handle to a previously initialized tensor descriptor. |
CudaDeviceVariable<System.Single> | x | Pointer to data of the tensor described by the srcDesc descriptor. |
System.Single | beta | Pointer to scaling factors (in host memory) used to blend the source value with prior value in the destination tensor as follows: dstValue = alpha[0]*srcValue + beta[0]*priorDstValue. Please refer to this section for additional details. |
TensorDescriptor | yDesc | Handle to a previously initialized tensor descriptor. |
CudaDeviceVariable<System.Single> | y | Pointer to data of the tensor described by the destDesc descriptor. |