Intel image classification is a dataset consist of multiple scene. This dataset have 25,000 images with size 150 × 150 that are labeled for 6 difference categories like buildings, forest, glacier, mountain, sea, and street.

The dataset split into 14,000 images for training, 3,000 images for testing, and 7,000 images for prediction. In this article we will only use training and testing.

link : Intel Image Classification

This notebook running in kaggle, make sure to use GPU because VGG19 is computational exhausting. You can also use the notebook from here

Link : notebook

Building VGG-19 From Scratch

Preparing dependencies and dataset

!pip install wandb

We will use wandb to monitor our model

Import All Dependencies

import tensorflow as tf
import matplotlib.pyplot as plt
import os
import datetime
from wandb.integration.keras import WandbMetricsLogger
import wandb

Initialize wandb

wandb.login(key="key")

You can enter the key or manually input it when it is prompted. You need to make sure that your key is secret.

Data Preprocessing

Configure Parameters

config = {
    "BATCH_SIZE" :64,
    "IMG_SIZE":150,
    "VAL_SPLIT":0.1,
    "EPOCHS":10,
    "PATIENCE":2
}

We will create configuration variable to adjust our training pipeline. This configuration also uploaded to wandb for checking future model version

Load Dataset

class_names = os.listdir("/kaggle/input/intel-image-classification/seg_train/seg_train")
print(len(class_names))

train_dataset = tf.keras.preprocessing.image_dataset_from_directory(
    "/kaggle/input/intel-image-classification/seg_train/seg_train",
    labels="inferred",
    label_mode="categorical",
    class_names=class_names,
    color_mode="rgb",
    batch_size=config["BATCH_SIZE"],
    image_size=(config["IMG_SIZE"],config["IMG_SIZE"]),
    shuffle=True,
    seed=32,
    validation_split=config["VAL_SPLIT"],
    subset="training",
    interpolation="bilinear",
    verbose=True
)

valid_dataset = tf.keras.preprocessing.image_dataset_from_directory(
    "/kaggle/input/intel-image-classification/seg_train/seg_train",
    labels="inferred",
    label_mode="categorical",
    class_names=class_names,
    color_mode="rgb",
    batch_size=config["BATCH_SIZE"],
    image_size=(config["IMG_SIZE"],config["IMG_SIZE"]),
    shuffle=True,
    seed=32,
    validation_split=config["VAL_SPLIT"],
    subset="validation",
    interpolation="bilinear",
    verbose=True
)

test_dataset = tf.keras.preprocessing.image_dataset_from_directory(
    "/kaggle/input/intel-image-classification/seg_test/seg_test",
    labels='inferred',
    label_mode='categorical',
    class_names=class_names,
    color_mode='rgb',
    batch_size=config["BATCH_SIZE"],
    image_size=(config["IMG_SIZE"],config["IMG_SIZE"]),
    interpolation='bilinear',
    verbose=True
)

In this code we will create tf.data from directory. The label are one-hot encoded and the training and validation are splitted to 0.9 training and 0.1 validation. The image are resized to 150 × 150

Visualize Distribution


def get_data_distribution(dataset=[]):
  labels = []
  for data in dataset:
    for _,label in data:
      labels.extend(tf.argmax(label,axis=1).numpy())
  y, idx, count = tf.unique_with_counts(labels)
  return y, count

def visualize_data(class_data,count):
  plt.figure(figsize=(10,10))
  plt.bar(class_data,count,align="center")
  plt.tight_layout()
  plt.title("Class Distribution")
  plt.ylabel("Number of Data")
  plt.xlabel("Class in number")
  plt.show()

class_data, counts = get_data_distribution([train_dataset,valid_dataset,test_dataset])
visualize_data([class_names[cd] for cd in class_data],counts.numpy())

We will visualize the distribution of each labels, it is important to have balanced image class distribution to achieve better performance on each class.

Visualize Sample Image

plt.figure(figsize=(10,10))
for img,label in train_dataset.take(1):
  for i in range(16):
    ax = plt.subplot(4,4,i+1)
    plt.imshow(img[i].numpy().astype("uint8"))
    plt.title(f"Label :{class_names[tf.argmax(label[i])]}")
    plt.axis(False)
plt.tight_layout()
plt.show()

Code above will visualize 16 images from batch 1. We can see the data are not arranged or random. We can use it to train our model.

Building and Training Model VGG-19

class VGG19(tf.keras.Model):
  def __init__(self,input_shape,output_shape,**kwargs):
    super().__init__(**kwargs)
    self.conv_blocks=[
        self._conv_block(64,2),
        self._conv_block(128,2),
        self._conv_block(256,4),
        self._conv_block(512,4),
        self._conv_block(512,4),
    ]
    self.flatten = tf.keras.layers.Flatten()
    self.dense1 = tf.keras.layers.Dense(units=4096,activation="relu",kernel_regularizer=tf.keras.regularizers.L2(0.0005))
    self.dropout1 = tf.keras.layers.Dropout(0.5)
    self.dense2 = tf.keras.layers.Dense(units=4096,activation="relu",kernel_regularizer=tf.keras.regularizers.L2(0.0005))
    self.dropout2 = tf.keras.layers.Dropout(0.5)
    self.output_layer = tf.keras.layers.Dense(units=output_shape,activation="softmax")

  def _conv_block(self,filters,num_layers):
    block = tf.keras.Sequential()
    for _ in range(num_layers):
      block.add(tf.keras.layers.Conv2D(filters=filters, kernel_size=(3,3), strides=(1,1), activation="relu",padding="same"))
    block.add(tf.keras.layers.MaxPool2D(pool_size=(2,2),strides=(2,2)))
    return block

  def call(self,inputs,training=False):
    x = inputs
    for block in self.conv_blocks:
      x = block(x)
    x = self.flatten(x)
    x = self.dense1(x)
    x = self.dropout1(x,training=training)
    x = self.dense2(x)
    x= self.dropout2(x)
    x = self.output_layer(x)
    return x

From previous article we find out that the VGG model have different version based on how deep the network. The deepest version is VGG-19.

We construct the model from scratch using keras layers. In order to have flexibility, we build the model using class model. In this method we can repeat several blocks multiple times. This can be efficient especially building deeper and custom architecture.

VGG_model = VGG19((config["IMG_SIZE"],config["IMG_SIZE"],3),len(class_names))
VGG_model(tf.keras.layers.Input(shape=(config["IMG_SIZE"],config["IMG_SIZE"],3)))
VGG_model.summary()

After define the model, we build the model based on the configuration. From the model that we have been construct, the total parameters is 70,388,806 parameters. Which is very high number of parameters.

VGG_model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001),
                  loss=tf.keras.losses.CategoricalCrossentropy(),
                  metrics=["accuracy"])

Next, we compile our model based on optimizer, loss, and defining our metric to accuracy.

Custom Callback

early_stopping_cb = tf.keras.callbacks.EarlyStopping(patience=config["PATIENCE"])
checkpoint_cb = tf.keras.callbacks.ModelCheckpoint("model_cb.keras",save_best_only=True)

# Initialize wandb
run = wandb.init(
    project="vgg19-intel-scratch",
    config=config,
)

In order to monitore our model we will use several callback like earlystopping, checkpoint, tensorboard, and wandb. After we define it we will initialize our wandb project and sync it with tensorboard.

Training Model

history = VGG_model.fit(train_dataset,epochs=config["EPOCHS"],validation_data=valid_dataset,callbacks=[early_stopping_cb,checkpoint_cb,
                                          WandbMetricsLogger(log_freq=10)])

To train the model, we will call method fit with adjusted configuration. The history variable we use to get the history of the training process.

Optionally you can use distributed learning if you have multiple GPU (check the notebook) to faster the training process

Evaluation

Evaluate on test dataset

loss, acc = VGG_model.evaluate(test_data)
print(loss,acc)

plt.plot(history.history["loss"],label="Train loss")
plt.plot(history.history["val_loss"],label="Val loss")
plt.xlabel("Epochs")
plt.ylabel("Accuracy")
plt.legend()
plt.show()

loss, acc = VGG_model.evaluate(test_data)
print(loss,acc)

Export The Label From Test Dataset

import numpy as np

y_pred = []  
y_label = []  

for images, labels in test_data:
    batch_pred = tf.argmax(VGG_model(images, training=False), axis=1).numpy()  # Get predictions
    y_pred.extend(batch_pred)  

    batch_labels = tf.argmax(labels, axis=1).numpy()
    y_label.extend(batch_labels)  

y_pred = np.array(y_pred)
y_label = np.array(y_label)


table = wandb.Table(columns=["Image", "True Label", "Predicted Label"])

y_pred = []  
y_label = []  
all_images = []  

for img_batch, labels in test_data:
    batch_pred = tf.argmax(VGG_model(img_batch, training=False), axis=1).numpy()
    y_pred.extend(batch_pred)
    batch_labels = tf.argmax(labels, axis=1).numpy()
    y_label.extend(batch_labels)
    all_images.extend(img_batch.numpy())

y_pred = np.array(y_pred)
y_label = np.array(y_label)
all_images = np.array(all_images)


plt.figure(figsize=(10, 10))
for i in range(16):  
    ax = plt.subplot(4, 4, i + 1)
    img = (all_images[i] * 255).astype("uint8")  
    plt.imshow(img)
    plt.title(f"Actual: {class_names[y_label[i]]}\nPred: {class_names[y_pred[i]]}")
    plt.axis(False)

plt.tight_layout()
plt.show()


for i in range(16):
    table.add_data(
        wandb.Image(all_images[i]), 
        class_names[y_label[i]],     
        class_names[y_pred[i]]      
    )

run.log({"predictions_table": table})
run.finish()

Evaluate Using Confusion Matrix

confusion_matrix = tf.math.confusion_matrix(
    y_label,
    y_pred,
    num_classes=len(class_names)
)
print("Confusion Matrix:")
print(confusion_matrix.numpy())

plt.figure()
sns.heatmap(confusion_matrix.numpy(),annot=True,fmt="d",cmap="Blues")
plt.xlabel("Predicted Label")
plt.ylabel("Actual Label")
plt.show()

Evaluate using classification Report

from sklearn.metrics import classification_report
report = classification_report(y_label,y_pred,target_names=[f"Class {i}" for i in range(len(class_names))])
print("Classification Report")
print(report)

Summary

From the article we know that building VGG 19 from scratch is feasible but it still need improvement. You also found that training VGG-19 is computational exhausting rather than previous model (LeNet). This model also have very number of parameter. We will explore more about how to build model that efficient with high accuracy

Intel Images Classification Using VGG-19