In 2012, a new Deep Learning algorithm shattered the annual ILSVRC computer vision competition. It's an Alexnet neural network, a convolutional neural network. Convolutional neural networks use a similar process to standard supervised learning methods: they receive input pictures, identify characteristics in each of them, and then drag a grader over them. While this may be true, features are taught by default! During training, classification error is reduced to improve classifier parameters and features by using the CNN in Deep Learning to do all the arduous jobs of extracting and characterizing features.

What is CNN?

As a sub-category of neural networks, convolutional neural networks have all of the features of neural networks. On the other hand, it created CNN mainly to handle pictures as input. As a result, its design is more straightforward: it comprises only two primary building components. Because it serves as a feature extractor, the initial block establishes the uniqueness of this specific sort of neural network. This is done by using convolution filtering techniques to accomplish template matching. Before normalization and scaling, "feature maps" are returned from the image's initial filtering layer using a variety of convolution kernels. We may filter the feature maps acquired with fresh kernels, normalize and resize them, and then repeat the procedure as many times as necessary. Finally, a vector is constructed by combining the values from all feature maps together. This vector defines the first block's output and the second's input.

Layers of CNN

When it comes to a convolutional neural network, there are four different layers of CNN: coevolutionary, pooling, ReLU correction, and finally, the fully connected level.

Convolutional Layer

  • This is the core building block of a CNN. It applies a set of learnable filters (also known as kernels) to the input data.
  • These filters are small grids that slide over the input image to perform element-wise multiplications and additions.
  • Each filter extracts specific features from the input data, such as edges, textures, or more complex patterns.
  • Multiple filters are used to capture different features.
  • The output of this layer is called feature maps.

Pooling Layer

  • Pooling layers reduce the spatial dimensions of the feature maps while retaining the most important information.
  • This helps in reducing computation and making the model translation invariant.
  • Max-pooling and average-pooling are common pooling techniques.
  • Max-pooling, for example, selects the maximum value within a small region of the feature map, reducing the size and introducing translational invariance.
  • Pooling helps reduce the computational complexity of the network and makes the model more robust to small shifts in the input data.

Activation Layer

  • After each convolutional layer, an activation function is applied element-wise to the feature maps.
  • It introduces non-linearity into the model which is essential for the network to learn complex patterns.
  • The most common activation function used in CNNs is the Rectified Linear Unit (ReLU).
  • ReLU activation function replaces negative values with zero and leaves positive values unchanged, introducing non-linearity into the model.

Dropout Layer

  • Dropout is a regularization technique used to prevent overfitting.
  • During training, a fraction of randomly selected neurons (typically set as a hyperparameter) is temporarily "dropped out" or ignored.
  • It prevents the network from relying too heavily on specific neurons and features.

Fully Connected Layer (Dense Layer)

  • The Fully Connected (FC) layer consists of the weights and biases along with the neurons
  • These layers connect every neuron in one layer to every neuron in the next layer.
  • They are typically used in the final layers of the CNN for classification or regression tasks.
  • This is usually placed before the output layer and reduces human supervision.
Some other layers in CNN are the Flatten, Input, and Output layers. Flatten Layer: Before the fully connected layers, the feature maps are typically flattened into a one-dimensional vector. This is done to match the dimensionality between the convolutional/pooling layers and the fully connected layers. Input Layer: This layer represents the raw input data, typically images. Each neuron in this layer corresponds to a pixel in the input image. Output Layer: The final layer in a CNN produces the output. The number of neurons in this layer depends on the specific task, e.g., one neuron for binary classification or several neurons for multi-class classification. These layers are typically stacked sequentially to form the architecture of the CNN.

Conclusion

In conclusion, Convolutional Neural Networks (CNNs) are a remarkable innovation in the field of deep learning. It is like a super-smart tool for computers to recognize and process pictures better. We've taken apart the different layers of CNN, from how they first see pictures to how they find important details. This knowledge helps us see how they're used in amazing technology, like self-driving cars and medical equipment. It's a step forward in making computers even smarter.