Learning to Build ResNet for Intel Scene Classification from Scratch

Learning to Build ResNet for Intel Scene Classification from Scratch

Overview

This dataset is scene classification dataset that published by Intel. This dataset also used in previous implementation. Dataset consist of 6 class that consist of difference categories like buildings, forest, glacier, mountain, sea, and street. The dataset have 25,000 images with the size of 150 × 150. This dataset published in kaggle which can be found here.

link : Intel Image Classification

In this article we will try to build ResNet from scratch based on the previous paper about residual network. In this article we will also use wandb to monitor our model performance.

Code

The notebook source can be downloaded in the end of the article. I am using kaggle notebook with 2 x T4 GPU to faster the training. The runtime is up to 3 hours.

Import Libraries and dependencies

!pip install --upgrade wandb
import tensorflow as tf
import matplotlib.pyplot as plt
import os
import datetime
from wandb.integration.keras import WandbMetricsLogger
import wandb

Login to Wandb

wandb.login(key="#")

Make sure to use your account and assign the key.

EDA and Data Preprocessing

Load Dataset

config = {
    "BATCH_SIZE" :32,
    "IMG_SIZE":150,
    "VAL_SPLIT":0.1,
    "EPOCHS":100,
    "PATIENCE":5
}

train_data = tf.keras.preprocessing.image_dataset_from_directory(
    "/kaggle/input/intel-image-classification/seg_train/seg_train",
    labels="inferred",
    label_mode="categorical",
    class_names=class_names,
    color_mode="rgb",
    batch_size=config["BATCH_SIZE"],
    image_size=(config["IMG_SIZE"],config["IMG_SIZE"]),
    shuffle=True,
    seed=32,
    validation_split=config["VAL_SPLIT"],
    subset="training",
    interpolation="bilinear",
    verbose=True
)

valid_data = tf.keras.preprocessing.image_dataset_from_directory(
    "/kaggle/input/intel-image-classification/seg_train/seg_train",
    labels="inferred",
    label_mode="categorical",
    class_names=class_names,
    color_mode="rgb",
    batch_size=config["BATCH_SIZE"],
    image_size=(config["IMG_SIZE"],config["IMG_SIZE"]),
    seed=32,
    validation_split=config["VAL_SPLIT"],
    subset="validation",
    interpolation="bilinear",
    verbose=True
)

test_data = tf.keras.preprocessing.image_dataset_from_directory(
    "/kaggle/input/intel-image-classification/seg_test/seg_test",
    labels='inferred',
    label_mode='categorical',
    class_names=class_names,
    color_mode='rgb',
    batch_size=config["BATCH_SIZE"],
    image_size=(config["IMG_SIZE"],config["IMG_SIZE"]),
    interpolation='bilinear',
    verbose=True
)

In this practice we use tensorflow and load it using image_dataset_from_directory to convert the data to tf.data pipeline which is more optimized pipeline.

Visualize Data Distribution

In order to gain more information about the data, we need to visualize the distribution of each class on the dataset to check whether is imbalance or not.

def get_data_distribution(dataset=[]):
  labels = []
  for data in dataset:
    for _,label in data:
      labels.extend(tf.argmax(label,axis=1).numpy())
  y, idx, count = tf.unique_with_counts(labels)
  return y, count



def visualize_data(class_data,count):
  plt.figure(figsize=(10,10))
  plt.bar(class_data,count,align="center")
  plt.tight_layout()
  plt.title("Class Distribution")
  plt.ylabel("Number of Data")
  plt.xlabel("Class in number")
  plt.show()

class_data, counts = get_data_distribution([train_data,valid_data,test_data])
visualize_data([class_names[cd] for cd in class_data],counts.numpy())

From the distribution we can see that the distribution is balanced.

Visualize Sample Image

To check the dataset we can visualize it with matplotlib

plt.figure(figsize=(10,10))
for img,label in train_data.take(1):
  for i in range(16):
    ax = plt.subplot(4,4,i+1)
    plt.imshow(img[i].numpy().astype("uint8"))
    plt.title(f"Label :{class_names[tf.argmax(label[i])]}")
    plt.axis(False)
plt.tight_layout()
plt.show()

From the scene we can understand what the label and the scene images.

Data Augmentation

To make the more variation of the data we can use several method to augmentation data on the training dataset.

a. Flip Vertical

def img_flip_vertical(img):
    return tf.image.flip_left_right(img)


test_flip = (train_data.map(lambda img, label: img_flip_vertical(img)))
plt.figure(figsize=(6,6))
for img in test_flip.take(1):
    for num in range(16):
        ax = plt.subplot(4,4,num+1)
        plt.imshow((img[num].numpy()).astype("uint8"))
        plt.axis("off")  
plt.suptitle("Flip Augmentation")
plt.tight_layout()
plt.show()

b. Random constrast

def img_random_contrast(img):
    return  tf.image.random_contrast(img, 0.95,1 )

test_random_contrast = (train_data.map(lambda img, label: img_random_contrast(img)))
for img in test_random_contrast.take(1):
    for num in range(16):
        ax = plt.subplot(4,4,num+1)
        plt.imshow((img[num].numpy()).astype("uint8"))
        plt.axis("off")  
plt.suptitle("Random Contrast")
plt.tight_layout()
plt.show()

c. Random Brightness

def img_random_brightness(img):
    return tf.image.random_brightness(img, .2)

test_random_brightness = (train_data.map(lambda img, label: img_random_brightness(img)))
plt.figure(figsize=(6,6))
for img in test_random_brightness.take(1):
    for num in range(16):
        ax = plt.subplot(4,4,num+1)
        plt.imshow((img[num].numpy()).astype("uint8"))
        plt.axis("off")  
plt.suptitle("Random Brightness")
plt.tight_layout()
plt.show()

d. Random Hue

def random_hue(img):
    return tf.image.random_hue(img, 0.05)

test_random_hue = (train_data.map(lambda img, label: random_hue(img)))
plt.figure(figsize=(6,6))
for img in test_random_hue.take(1):
    for num in range(16):
        ax = plt.subplot(4,4,num+1)
        plt.imshow((img[num].numpy()).astype("uint8"))
        plt.axis("off")  
plt.suptitle("Random Hue")
plt.tight_layout()
plt.show()

d. Implementing the augmentation and normalization on each dataset

def image_augmentation(img,label):
  img,label = normalize_img(img,label)
  img = img_flip_vertical(img)
  img = img_random_contrast(img)
  img = img_random_brightness(img)
  img = random_hue(img)
  return img,label

def normalize_img(img,label):
  img = tf.cast(img,tf.float32)/255.0
  return img,label
train_dataset = (
    train_data
    .map(lambda img, label: image_augmentation(img, label), num_parallel_calls=tf.data.AUTOTUNE)  
    .cache()
    .prefetch(buffer_size=tf.data.AUTOTUNE) 
)

valid_dataset = (
    valid_data
    .map(lambda img, label: normalize_img(img, label), num_parallel_calls=tf.data.AUTOTUNE)  
    .cache()  
    .prefetch(buffer_size=tf.data.AUTOTUNE)  
)

# Optimization for test_data
test_dataset = (
    test_data
    .map(lambda img, label: normalize_img(img, label), num_parallel_calls=tf.data.AUTOTUNE) 
    .prefetch(buffer_size=tf.data.AUTOTUNE) 
)

e. Check the augmented training dataset

plt.figure(figsize=(12, 12))
for img, label in train_dataset.take(1):
    for num in range(16): 
        ax = plt.subplot(4, 4, num + 1)        
        img_to_display = (img[num].numpy() * 255).clip(0, 255).astype("uint8")
        plt.imshow(img_to_display)
        plt.axis("off")  
plt.suptitle("Augmented Dataset")
plt.tight_layout()
plt.show()

from the augmentation we can see that the image doesn’t change too much but we have difference kind of the images. We need to check the augmented data is difference from the training or not.

Building Model ResNet

In the resnet there are two type of the block which is residual block and bottleneck block. This is difference strategy and can be used in difference scenarios. Check the previous post to know the difference of these block. In this article we will try to build 5 different resNet models.

Residual Block

class ResidualBlock(tf.keras.layers.Layer):
    def __init__(self, filters, kernel_size=3, strides=1, downsample=False):
        super(ResidualBlock, self).__init__()
        self.downsample = downsample
        self.conv1 = tf.keras.layers.Conv2D(filters, kernel_size, strides=strides, padding="same",kernel_regularizer=tf.keras.regularizers.L2(0.001))
        self.bn1 = tf.keras.layers.BatchNormalization()
        self.relu = tf.keras.layers.ReLU()
        self.conv2 = tf.keras.layers.Conv2D(filters, kernel_size, strides=1, padding="same",kernel_regularizer=tf.keras.regularizers.L2(0.001))
        self.bn2 = tf.keras.layers.BatchNormalization()

        if downsample:
            self.downsample_conv = tf.keras.layers.Conv2D(filters, 1, strides=strides,kernel_regularizer=tf.keras.regularizers.L2(0.001))
            self.downsample_bn = tf.keras.layers.BatchNormalization()

    def call(self, inputs, training=False):
        residual = inputs
        x = self.conv1(inputs)
        x = self.bn1(x, training=training)
        x = self.relu(x)
        x = self.conv2(x)
        x = self.bn2(x, training=training)

        if self.downsample:
            residual = self.downsample_conv(inputs)
            residual = self.downsample_bn(residual, training=training)
        x += residual
        return self.relu(x)

This block is used in shallower network in resnet architecture. This block will have more parameters rather than bottleneck block.

Bottleneck Block

class BottleneckBlock(tf.keras.layers.Layer):
    def __init__(self, filters, strides=1, downsample=False):
        super(BottleneckBlock, self).__init__()
        self.downsample = downsample
        self.conv1 = tf.keras.layers.Conv2D(filters // 2, 1, strides=1, padding="same",kernel_regularizer=tf.keras.regularizers.L2(0.001))
        self.bn1 = tf.keras.layers.BatchNormalization()
        self.conv2 = tf.keras.layers.Conv2D(filters // 2, 3, strides=strides, padding="same",kernel_regularizer=tf.keras.regularizers.L2(0.001))
        self.bn2 = tf.keras.layers.BatchNormalization()
        self.conv3 = tf.keras.layers.Conv2D(filters, 1, strides=1, padding="same",kernel_regularizer=tf.keras.regularizers.L2(0.001))
        self.bn3 = tf.keras.layers.BatchNormalization()
        self.relu = tf.keras.layers.ReLU()

        if downsample:
            self.downsample_conv = tf.keras.layers.Conv2D(filters, 1, strides=strides,kernel_regularizer=tf.keras.regularizers.L2(0.001))
            self.downsample_bn = tf.keras.layers.BatchNormalization()

    def call(self, inputs, training=False):
        residual = inputs
        x = self.conv1(inputs)
        x = self.bn1(x, training=training)
        x = self.relu(x)

        x = self.conv2(x)
        x = self.bn2(x, training=training)
        x = self.relu(x)

        x = self.conv3(x)
        x = self.bn3(x, training=training)

        if self.downsample:
            residual = self.downsample_conv(inputs)
            residual = self.downsample_bn(residual, training=training)

        x += residual
        return self.relu(x)

This block is used for deeper network in resnet. This network enable us to make deeper network with fewer parameters.

ResNet

class ResNet(tf.keras.Model):
    def __init__(self, num_classes, block_counts,blocktype, initial_filters=64):
        super(ResNet, self).__init__()
        self.conv1 = tf.keras.layers.Conv2D(initial_filters, 7, strides=2, padding="same",kernel_regularizer=tf.keras.regularizers.L2(0.001))
        self.bn1 = tf.keras.layers.BatchNormalization()
        self.relu = tf.keras.layers.ReLU()
        self.pool = tf.keras.layers.MaxPooling2D(pool_size=3, strides=2, padding="same")

        # Replace list with Sequential for residual blocks
        self.residual_blocks = tf.keras.Sequential(name="residual_blocks")
        filters = initial_filters
        for i, count in enumerate(block_counts):
            for j in range(count):
                strides = 2 if j == 0 and i > 0 else 1  # Downsample at the start of a new stage
                self.residual_blocks.add(
                    blocktype(filters, strides=strides, downsample=(strides == 2))
                )
            filters *= 2

        self.global_pool = tf.keras.layers.GlobalAveragePooling2D()
        self.dropout = tf.keras.layers.Dropout(0.3)
        self.fc = tf.keras.layers.Dense(num_classes, activation="softmax")

    def call(self, inputs, training=False):
        x = self.conv1(inputs)
        x = self.bn1(x, training=training)
        x = self.relu(x)
        x = self.pool(x)
        x = self.residual_blocks(x, training=training)  # Pass through all residual blocks
        x = self.global_pool(x)
        x = self.dropout(x,training=training)
        return self.fc(x)

We build the resnet using sub class api to have more flexibility over the models.

ResNet 18

def build_resnet18(num_classes, input_shape=(150, 150, 3)):
    return ResNet(
        num_classes=num_classes,
        block_counts=[2, 2, 2, 2],  # From the table for ResNet-18
        blocktype=ResidualBlock,  # Use basic blocks
        initial_filters=64,
    )

resnet18 = build_resnet18(num_classes=6)
dummy_input = tf.keras.Input(shape=(150, 150, 3))
resnet18(dummy_input)  # Build the model
resnet18.summary()

ResNet 34

def build_resnet34(num_classes, input_shape=(150, 150, 3)):
    return ResNet(
        num_classes=num_classes,
        block_counts=[3, 4, 6, 3],  # From the table for ResNet-18
        blocktype=ResidualBlock,  # Use basic blocks
        initial_filters=64,
    )

resnet34 = build_resnet34(num_classes=len(class_names))
dummy_input = tf.keras.Input(shape=(150, 150, 3))
resnet34(dummy_input)  # Build the model
resnet34.summary()

ResNet 50

def build_resnet50(num_classes, input_shape=(150, 150, 3)):
    return ResNet(
        num_classes=num_classes,
        block_counts=[3, 4, 6, 3],  # From the table for ResNet-152
        blocktype=BottleneckBlock,  # Use bottleneck blocks
        initial_filters=64,
    )


resnet50 = build_resnet50(num_classes=len(class_names))
dummy_input = tf.keras.Input(shape=(150, 150, 3))
resnet50(dummy_input)  # Build the model
resnet50.summary()

ResNet 152

def build_resnet101(num_classes, input_shape=(150, 150, 3)):
    return ResNet(
        num_classes=num_classes,
        block_counts=[3, 4, 23, 3],  # From the table for ResNet-152
        blocktype=BottleneckBlock,  # Use bottleneck blocks
        initial_filters=64,
    )


resnet101 = build_resnet101(num_classes=len(class_names))
dummy_input = tf.keras.Input(shape=(150, 150, 3))
resnet101(dummy_input)  # Build the model
resnet101.summary()

From the code we can see that we build 5 different resnet model which are introduced in the paper. we will try to training from scratch based on the intel scene image dataset.

Training the Model

config = {
    "BATCH_SIZE" :32,
    "IMG_SIZE":150,
    "VAL_SPLIT":0.1,
    "EPOCHS":100,
    "PATIENCE":3,
    "lr":0.0001
}

def train_model(build_model,model_type):
    wandb.init(
        project="resnet-comparison",
        group="ResNet Variants",
        name=f"ResNet-{model_type}",  # Example: ResNet-18, ResNet-50, etc.
        config=config
    )
    strategy = tf.distribute.MirroredStrategy()
    early_stopping_cb = tf.keras.callbacks.EarlyStopping(patience=config["PATIENCE"])
    wandb_callback = wandb.keras.WandbCallback(save_model=False)
    with strategy.scope():
        model = build_model(num_classes=len(class_names))
        model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=config["lr"]),
                        loss=tf.keras.losses.CategoricalCrossentropy(),
                        metrics=["accuracy"])
        history = model.fit(
            train_dataset,
            epochs=config["EPOCHS"],
            validation_data=valid_dataset,
            callbacks=[early_stopping_cb,wandb_callback]
        )

    test_loss, test_acc = model.evaluate(test_dataset)
    wandb.log({
    "test loss": test_loss,
    "test Accuracy": test_acc
})

    wandb.finish()
LIST_MODEL = {
    "resnet18": build_resnet18,
    "resnet34": build_resnet34,
    "resnet50": build_resnet50,
    "resnet101": build_resnet101,
    "resnet152": build_resnet152,
}

# Train all models
for model_type, model in LIST_MODEL.items():
    train_model(model, model_type)

In this code we use distributed mirrored strategy to utilize 2 x T4 GPUS to faster the training. The model logs will be showed in wandb. This training may take several hours (up to 3)

Evaluation Wandb

From the evaluation we can see that the ResNet models are easily overfit on the training data. Which is represent by comparing accuracy and val accuracy. This may happen because the model is trained from the scratch which is need more data. we can try to transfler learning to optimize the models.

Summary

This article demonstrates the implementation of ResNet Architecture from scratch on intel scene dataset. This demonstrate the challenge of training deep network without pretrained weights. The models are likely become overfitting and the val accuracy only reach about 0.70 accuracy. We can improved in by regularization, increasing data, augmentation or transfer learning.

Source Code : https://github.com/fadhilelrizanda/intel-scene-resnet-scratch