cnn_batch

author：Edge
date：2023年04月03日23:10:10

Introduction

Deep learning has revolutionized the field of artificial intelligence by enabling computers to learn from large amounts of data. Convolutional Neural Networks (CNNs) are a type of deep learning algorithm that has proven very successful in image and video analysis. In this blog, we will discuss the theory behind the CNN computing pipeline, the definition of a 4D matrix, and the details of the functions used in this pipeline.

CNN Computing Pipeline The CNN computing pipeline consists of several stages: input, convolution, pooling, and output. The input is an image or video frame that is fed into the network. The convolution stage applies filters to the input to extract features. The pooling stage downsamples the feature maps to reduce their dimensionality. Finally, the output stage applies fully connected layers to produce the final classification result.

The convolution stage is the most important part of the pipeline. It applies a set of filters to the input image or video to extract features. Each filter produces a feature map that highlights certain aspects of the input. The convolution operation is defined by the following equation:

\begin{matrix} (1) & y_{i, j} = \sum_{k = 1}^{K} \sum_{l = 1}^{L} x_{i + k - 1, j + l - 1} \times w_{k, l} \end{matrix}

where x is the input matrix, w is the filter matrix, and y is the output matrix. The filter matrix is slid over the input matrix, computing the dot product between the filter and a local region of the input. This operation is repeated for every position in the input matrix to produce the output matrix.

the code of this functions like this：


Matrix3d conv_test_with_output(Matrix3d mid1, 
                                int input_dim = 3, 
                                int output_channels = 3, 
                                int stride = 1, 
                                int kernel_size = 2, 
                                int mode = 0, 
                                bool verbose = false)
  {
    if (verbose) {
        cout << "Input Matrix3d: " << endl;
        cout_mat3d(mid1);
        cout << "Parameters: input_dim = " << input_dim 
             << ", output_channels = " << output_channels 
             << ", stride = " << stride 
             << ", kernel_size = " << kernel_size 
             << ", mode = " << mode;
    }

    // Compute padding widths and heights
    int padding_wid = stride - (mid1.wid - kernel_size) % stride;
    if (padding_wid == stride) {
        padding_wid = 0;
    }
    int padding_high = stride - (mid1.high - kernel_size) % stride;
    if (padding_high == stride) {
        padding_high = 0;
    }
    if (verbose) {
        cout << "Padding widths: " << padding_wid << ", padding heights: " << padding_high << endl;
    }

    // Pad each RGB channel in the 3D matrix
    Matrix mid_rgb[input_dim];
    for (int rgb_idx = 0; rgb_idx < input_dim; rgb_idx++)
    {   
        mid_rgb[rgb_idx] = edge_padding(mid1.matrix3d[rgb_idx], 
                                         mid1.matrix3d[rgb_idx].row + padding_high, 
                                         mid1.matrix3d[rgb_idx].col + padding_wid);
        if (verbose) {
            cout << "RGB[" << rgb_idx << "] channel after padding: " << endl;
            cout_mat(mid_rgb[rgb_idx]);
        }
    }

    // Construct filters
    Matrix filters[output_channels][input_dim];
    for (int channel_index = 0; channel_index < input_dim; channel_index++)
    {
        for (int filter_index = 0; filter_index < output_channels; filter_index++)
        {
            Matrix kernel = ones(kernel_size, kernel_size);
            filters[channel_index][filter_index] = kernel;
        }
    }

    // Compute convolution results for each filter
    Matrix kernel = ones(kernel_size, kernel_size);
    Matrix feature_maps[output_channels];
    for (int filter_idx = 0; filter_idx < output_channels; filter_idx++)
    {
        Matrix sum_rgb = CreateMatrix(((mid1.wid - kernel_size + 2*padding_wid) / stride) + 1, 
                                      ((mid1.high - kernel_size + 2*padding_high) / stride) + 1);
        for (int channel_idx = 0; channel_idx < input_dim; channel_idx++)
        {
            // Compute convolution result for a single RGB channel and a single filter
            Matrix element = conv_element(mid_rgb[channel_idx], 
                                          filters[channel_idx][filter_idx], 
                                          kernel_size, stride);
            if (verbose) {
                cout << "Convolution of RGB[" << channel_idx << "] channel with Filter[" 
                     << filter_idx << "] : " << endl;
                cout_mat(mid_rgb[channel_idx]);
                cout << " * " << endl;
                cout_mat(filters[channel_idx][filter_idx]);
                cout << " = " << endl;
                cout_mat(element);
                cout << endl;
            }
            // Sum convolution results for each RGB channel
        sum_rgb = add(sum_rgb, element, 0);
    }
    feature_maps[filter_idx] = sum_rgb;
    if (verbose) {
        cout << "Feature map [" << filter_idx << "] : " << endl;
        cout_mat(feature_maps[filter_idx]);
    }
  }
  // Construct 3D matrix to store different feature maps at different depths
  Matrix3d output3d = CreateMatrix3d(output_channels, feature_maps[0].row, feature_maps[0].col);
  for (int i = 0; i < output_channels; i++)
  {
      output3d.matrix3d[i] = feature_maps[i];
  }
  if (verbose) {
      cout << "Output Matrix3d: " << endl;
      cout_mat3d(output3d);
  }
  return output3d;
  }

this function is use for the elements convolution compute,and it can promote to the batch-compute：


Matrix4d batch_conv_test(Matrix4d mid4, 
                         int input_dim = 3, 
                         int output_channels = 3, 
                         int stride = 1, 
                         int kernel_size = 2, 
                         int mode = 0,
                         bool verbose = true)
{
    Matrix3d *output3d_arr = (Matrix3d *)malloc(mid4.batch * sizeof(Matrix3d));
    for (int batch_idx = 0; batch_idx < mid4.batch; batch_idx++)
    {
        Matrix3d mid3 = mid4.matrix4d[batch_idx];
        Matrix3d output3d = conv_test_with_output(mid3, input_dim, output_channels, stride, kernel_size, mode,verbose);
        output3d_arr[batch_idx] = output3d;
    }

    Matrix4d output4d = CreateMatrix4d(mid4.batch, output_channels, output3d_arr[0].wid, output3d_arr[0].high);
    for (int batch_idx = 0; batch_idx < mid4.batch; batch_idx++)
    {
        output4d.matrix4d[batch_idx] = output3d_arr[batch_idx];
    }
    return output4d;
}

About the define of matrix：

4D Matrix Definition In CNNs, we often work with 4D matrices. A 4D matrix is a collection of 3D matrices, where each 3D matrix represents a single example. The four dimensions are batch, depth, width, and height. The batch dimension represents the number of examples in a batch. The depth dimension represents the number of filters. The width and height dimensions represent the spatial dimensions of the input.


typedef struct
{
    int row, col;
    float **matrix;
} Matrix;
typedef struct
{
    int row, col;
    string **str_matrix;
} str_Matrix;
typedef struct
{
    int dep, wid, high;
    Matrix *matrix3d;
} Matrix3d;
typedef struct
{
    int batch, dep, wid, high;
    Matrix3d *matrix4d;
} Matrix4d;

Functions Used in the Pipeline The code provided in this blog includes two functions: conv_test_with_output and batch_conv_test.

The conv_test_with_output function takes a 3D matrix as input and applies a set of filters to extract features. It returns a 3D matrix with the same batch size and number of filters as the input. This function uses edge padding to deal with the border pixels of the input matrix. It also constructs the filters by applying a kernel matrix of ones to each input channel.

The batch_conv_test function takes a 4D matrix as input and applies the conv_test_with_output function to each 3D matrix in the batch. It returns a 4D matrix with the same batch size and number of filters as the input.

Conclusion CNNs are a powerful tool for image and video analysis. The CNN computing pipeline consists of several stages, including input, convolution, pooling, and output. Convolution is the most important stage and applies a set of filters to extract features. In CNNs, we often work with 4D matrices to represent batches of examples. The functions provided in this blog can be used to implement the convolution stage of a CNN.