If you would like to do some image processing tasks with AI, how will you try to do that? The answer is likely convolutional neural networks, which is a type of algorithm suited for processing images and multidimensional data. Let’s explore how convolutional neural networks work in this article.
The Structure of Convolutional Neural Networks
You may have heard that neural networks contain the input layer, hidden layer, and output layer, where the hidden layers contain many neurons that perform calculations. But this paradigm isn’t the full story for convolutional neural networks (CNNs). The input, hidden, and output layers are present, but the intermediate layers are different. Instead of neurons with different weights assigned to them, there are these kinds of layers in the middle:
- Convolution layers
- Pooling layers
- Fully connected layers
1. Convolution Layers
When an image is fed into a convolutional layer, it undergoes a transition known as a convolution. Basically, when a matrix is applied to an image, it changes the image pixel by pixel. You can think of it as applying a filter (known as a kernel) onto an image, which facilitates the extraction of features.
The image takes on the shape (x, y, z), where x is the height, y is the width, and z is the number of data channels in the image. for greyscale images, z equals 1, and for RGB color images, z equals 3. For some image processing tasks, only a 2-dimensional convolution will be used. That means splitting the image up into its three constituent data channels for color images. But in other cases, the matrix can scan through the 3-dimensional space directly.
As the convolution matrix scans through each pixel in the space, it multiplies the value of each number in the kernel by the corresponding number in the image (relative to the center). The products are then summed up, and a value is determined for that pixel, which gets added to the convolved image. This may be a little bit difficult to understand when explaining with text, so here is an image that shows the convolution process.
A filter scans through a matrix once to obtain the value of a single cell
The values of the matrix determine the transformations that the image can take. Some matrices can be used in edge detection, while others can be used for blurring. After passing through the kernel, the image data is taken to the next layer.
2. Pooling Layers
After taking the image through the convolutional layer, it might still be too large for the algorithm to process efficiently. Therefore, there should be some way to compress the image before it goes into the classical neural network, and this is where pooling layers come in.
When an image passes through a pooling layer, it is split into 2-dimensional patches of some given size (2×2, for example), separated by some number of pixels known as strides (e.g., 2 pixels). Then, the patches are run through a function summarizing the data values, like taking the average or the maximum of the pixels.
A filter scans through a matrix once to obtain the value of a single cell
This works well because adjacent pixels in normal images are usually similar (or even identical) — the transitions between colors in actual images in the real world are smooth for most of the pixels, as the photos are not randomly-generated noise. This fact allows max pooling and average pooling to capture most of the content in the image — even if the image is greatly (and lossily) compressed in the process.
This effect is shown in the video below, where the values of the adjacent pixels in an image of trees usually change smoothly (except in boundaries between objects).
3. Fully Connected Layers
Then, the data points are sent to the fully connected layer. This is basically the layer you see in a classical neural network. In this layer, the “neurons” compute to receive the weighted input and generate an output value, as their outputs are weighed differently by the next layer. Those weights are adjusted during training.
The fully connected layers are also where you’ll see the output layer. In image classification, for example, there is one output neuron per category. For example, if you have a binary classification algorithm, your output layer should have a size of 2, each neuron outputting the confidence value for the classification. The choice with the largest confidence value is then selected as the final decision.
Aside from the fully connected layers, there could also be other types of layers, such as normalization and dropout layers. In fact, a CNN works just like a normal neural network but with convolution and pooling layers added.
Why Are Convolutional Neural Networks So Useful?
CNNs are useful because they are adept at extracting features from images. Specifically, as mentioned above, you can detect edges and other features using convolution kernels, and by combining many of these kernels into the filter layers, the convolutional layer can isolate or highlight many features or objects using the kernels alone.
Remember that a neural network needs to extract concrete features to learn the patterns in the data. Some algorithms, specifically the deep-learning ones, can extract features automatically. The convolution layer is one way to deal with the feature extraction process, optimizing the parameters in the kernels during training to make the features as obvious to the computer and as easy to learn as possible. As the convolution process works well on images, convolutional neural networks usually do well in computer vision, and are widely used for analyzing images.
Conclusion
To conclude, a convolutional neural network is just a normal neural network — with convolution and pooling layers processing the data and extracting the features. This makes it useful for computer vision — as these layers are one of the most efficient ways to figure out the features that make each image different. If you would like more information, please take a look at the articles in the references below.
References
- (2023, December 20). Introduction to Convolution Neural Network. Retrieved January 31, 2024, from https://www.geeksforgeeks.org/introduction-convolution-neural-network/
- Balachandran, S. (2020, March 21). Machine Learning – Convolution with color images. Retrieved January 31, 2024, from https://dev.to/sandeepbalachandran/machine-learning-convolution-with-color-images-2p41
- (n.d.). Convolution. Retrieved January 31, 2024, from https://developer.nvidia.com/discover/convolution
- Pandey, A. K. (2020, June 25). Convolution, Padding, Stride, and Pooling in CNN – Medium. Retrieved January 31, 2024, from https://medium.com/analytics-vidhya/convolution-padding-stride-and-pooling-in-cnn-13dc1f3ada26