Exploring Residual Networks (ResNet): A Breakthrough in Deep Learning

Exploring Residual Networks (ResNet): A Breakthrough in Deep Learning

Overview

The paper “Deep Residual Network for Image Recognition”, authored by aiming He, Xiangyu Zhang, and others in 2015, introduce new CNN-based architecture known as ResNet This paper became one of the breakthrough in the development of deep learning especially in structure of CNN. This paper present of Residual Block that can prevent degradation in deeper network. With this network, we can build deeper network without facing problem of degradation or poor performance in poor network.

paper : https://arxiv.org/pdf/1512.03385

Residual Block

Building deep network is one of the best method to gain more accuracy in image classification task. But when the network is deeper, there more likely to have poor performance than shallower network. This happen because of vanishing/exploding gradient. To prevent the vanishing/exploding gradient, there are many regularization method that already implemented like dropout, L1, and L2 regularization. But this method doesn’t guaranteed to prevent this problem.

The author present residual learning as know as residual block to prevent gradient problem. The idea of this approach is it is easier to optimize the residual mapping than to optimize the original, unreferenced mapping. To the extreme, if an identity mapping were optimal, it would be easier to push the residual to zero than to fit an identity mapping by a stack of nonlinear layers.

$$H(x) = F(x) + x$$

Experimentation

In this paper using comparison between VGG-19, Plain network and residual network. We can see that the difference of each network on image that posted in paper below.

The experimentation are done in imageNet dataset and CIFAR dataset. From the experimentation have result below.

From the result we can see that the ResNet is less error than the plain network with the same parameter. We also can conclude that the deeper the network on plain network. It have poor performance on less layer. Otherwise in ResNet the performance in deeper network have better performance than shallow network. It can conclude that the residual block increase the accuracy in deeper network by prevent the degradation.

Comparing from table 3. The ResNet with deeper network ResNet 152 achieve the best performance from the comparison in imageNet dataset.

Residual Block Method

1. Basic Residual Block (Identity Shortcut)

Residual block consist of a few stacked layers, typically two or three. For a basic block with two layers:

$$y=F(x,{Wi})+x$$

y : the function representing the residual mapping, consisting of layers such as convolutions and activation

x = input to the block

y = output to the block

This method is suitable when the input and output dimension of the block are the same

2. Residual Block with projection Shortcut

This method modifies the shortcut connection when the input and output dimensions differ. It is uses a linear projection to match dimension

$$y=F(x,{Wi})+Wsx$$

  • Ws : Learnable linear projection, typically using 1 x 1 convolution

  • The projection adjusts the dimensions of the x to match those of F (x)

This method is used when the block changes the feature map dimensions (increasing the number of channels or downsampling)

3. Bottleneck Residual Block

For deeper network with hundreds of layers, a bottleneck design is used to reduce computational complexity while maintaining performance

$$F(x)=W3σ(W2σ(W1x))$$

W1, W2, W3 : Convolutional layers with 1 x 1, 3x3, and 1 x 1 filters

The 1 x 1 convolutions reduce and then restore the dimensions, while the 3 x 3 convolutions operates the reduced dimension

The shortcut can be identity or projection based, depending on whether dimensions match

This design reduces the number of parameters and computational cost, enabling efficient training very deep networks. This method used in ResNet 50, ResNet 101 or ResNet 15

Summary

The paper “Deep Residual Network for Image Recognition” (2015), authored by Kaiming He, Xiangyu Zhang, and others, introduces ResNet, a CNN-based architecture that revolutionized deep learning by addressing performance degradation in deep networks. The key innovation of ResNet is the Residual Block, which prevents the vanishing/exploding gradient problem and allows for the training of much deeper networks.