Understanding VGG Models: A Breakthrough in Deep Learning

The Visual Geometry Group (VGG), led by Karen Simoyan & Andrew Zisserman, introduced the groundbreaking concept of very deep convolutional network for large scale image recognition in 2014. This work invetigated how increasing the depth of convolutional networks affects their accuracy in large scale image recognition tasks. VGG achieve first place on ImageNet challenge 2014 and second place on localization problem. At that time, VGG set a new benchmark for state of the art both in image classification and classification.

Paper : source

Key Innovation of VGG

1. Using Very Small Kernel Size

VGG using small kernel size on convolutional layer which is 3 x 3 rather than 7 x 7 which is often used in previous architecture. This approach offers several advantage

Improved Feature Representation

Smaller kernel allows the network to capture finer details, resulting in a more discriminative decision function
Fewer parameter

Using three consecutive 3 x 3 kernel achieves the same receptive field of single 7x7 kernel, but with significantly fewer parameters. This reduce computational costs and improve efficiency

2. ReLU Activation Function

VGG employs Rectified Linear Unit (ReLU) as the activation function across all layers, replacing the previous old activation like tanh. ReLU accelerates convergence during training and enhance gradient flow, making it more efficient choice for deep architecture.

3. Regularization using droput and L2

To prevent overfitting in using deep or complex model architecture. The regularization method that implemented in VGG are :

Dropout

Randomly drops units from fully connected layers during training, reducing co-adaptation of neurons
L2 Regularization

Adds a penalty proportional to magnitude of weights, encouraging the network to learn simpler models

Architecture

VGG have several version based on the layer configuration. The version of VGG can be seen in table below (from the paper).

From the paper we can see that there are 11 - 19 layers on each VGG version as known as VGG11, VGG13, VGG16, and VGG19. Which have the total parameter below.

Summary

VGG model pushed the boundaries of convolutional neural network by :

Increasing depth

Demonstrating that deeper networks can achieve better accuracy when paired with smaller kernel
Efficient design

Reducing parameters while maintaining performance through 3 x 3 convolutions

What next?

Next post we will try building the VGG from scratch using tensorflow. This practice makes deeper understanding of VGG and enhance our practical skills.

Feel free to ask questions or share your thoughts in the comments. I'd love to hear your feedback and discuss this further!