Intel image classification is a dataset consist of multiple scene. This dataset have 25,000 images with size 150 × 150 that are labeled for 6 difference categories like buildings, forest, glacier, mountain, sea, and street.
The dataset split into 14,000 images for training, 3,000 images for testing, and 7,000 images for prediction. In this article we will only use training and testing.
link : Intel Image Classification
This notebook running in kaggle, make sure to use GPU because VGG19 is computational exhausting. You can also use the notebook from here
Link : notebook
Building VGG-19 From Scratch
Preparing dependencies and dataset
!pip install wandb
We will use wandb to monitor our model
Import All Dependencies
import tensorflow as tf
import matplotlib.pyplot as plt
import os
import datetime
from wandb.integration.keras import WandbMetricsLogger
import wandb
Initialize wandb
wandb.login(key="key")
You can enter the key or manually input it when it is prompted. You need to make sure that your key is secret.
Data Preprocessing
Configure Parameters
config = {
"BATCH_SIZE" :64,
"IMG_SIZE":150,
"VAL_SPLIT":0.1,
"EPOCHS":10,
"PATIENCE":2
}
We will create configuration variable to adjust our training pipeline. This configuration also uploaded to wandb for checking future model version
Load Dataset
class_names = os.listdir("/kaggle/input/intel-image-classification/seg_train/seg_train")
print(len(class_names))
train_dataset = tf.keras.preprocessing.image_dataset_from_directory(
"/kaggle/input/intel-image-classification/seg_train/seg_train",
labels="inferred",
label_mode="categorical",
class_names=class_names,
color_mode="rgb",
batch_size=config["BATCH_SIZE"],
image_size=(config["IMG_SIZE"],config["IMG_SIZE"]),
shuffle=True,
seed=32,
validation_split=config["VAL_SPLIT"],
subset="training",
interpolation="bilinear",
verbose=True
)
valid_dataset = tf.keras.preprocessing.image_dataset_from_directory(
"/kaggle/input/intel-image-classification/seg_train/seg_train",
labels="inferred",
label_mode="categorical",
class_names=class_names,
color_mode="rgb",
batch_size=config["BATCH_SIZE"],
image_size=(config["IMG_SIZE"],config["IMG_SIZE"]),
shuffle=True,
seed=32,
validation_split=config["VAL_SPLIT"],
subset="validation",
interpolation="bilinear",
verbose=True
)
test_dataset = tf.keras.preprocessing.image_dataset_from_directory(
"/kaggle/input/intel-image-classification/seg_test/seg_test",
labels='inferred',
label_mode='categorical',
class_names=class_names,
color_mode='rgb',
batch_size=config["BATCH_SIZE"],
image_size=(config["IMG_SIZE"],config["IMG_SIZE"]),
interpolation='bilinear',
verbose=True
)
In this code we will create tf.data from directory. The label are one-hot encoded and the training and validation are splitted to 0.9 training and 0.1 validation. The image are resized to 150 × 150
Visualize Distribution
def get_data_distribution(dataset=[]):
labels = []
for data in dataset:
for _,label in data:
labels.extend(tf.argmax(label,axis=1).numpy())
y, idx, count = tf.unique_with_counts(labels)
return y, count
def visualize_data(class_data,count):
plt.figure(figsize=(10,10))
plt.bar(class_data,count,align="center")
plt.tight_layout()
plt.title("Class Distribution")
plt.ylabel("Number of Data")
plt.xlabel("Class in number")
plt.show()
class_data, counts = get_data_distribution([train_dataset,valid_dataset,test_dataset])
visualize_data([class_names[cd] for cd in class_data],counts.numpy())
We will visualize the distribution of each labels, it is important to have balanced image class distribution to achieve better performance on each class.
Visualize Sample Image
plt.figure(figsize=(10,10))
for img,label in train_dataset.take(1):
for i in range(16):
ax = plt.subplot(4,4,i+1)
plt.imshow(img[i].numpy().astype("uint8"))
plt.title(f"Label :{class_names[tf.argmax(label[i])]}")
plt.axis(False)
plt.tight_layout()
plt.show()
Code above will visualize 16 images from batch 1. We can see the data are not arranged or random. We can use it to train our model.
Building and Training Model VGG-19
class VGG19(tf.keras.Model):
def __init__(self,input_shape,output_shape,**kwargs):
super().__init__(**kwargs)
self.conv_blocks=[
self._conv_block(64,2),
self._conv_block(128,2),
self._conv_block(256,4),
self._conv_block(512,4),
self._conv_block(512,4),
]
self.flatten = tf.keras.layers.Flatten()
self.dense1 = tf.keras.layers.Dense(units=4096,activation="relu",kernel_regularizer=tf.keras.regularizers.L2(0.0005))
self.dropout1 = tf.keras.layers.Dropout(0.5)
self.dense2 = tf.keras.layers.Dense(units=4096,activation="relu",kernel_regularizer=tf.keras.regularizers.L2(0.0005))
self.dropout2 = tf.keras.layers.Dropout(0.5)
self.output_layer = tf.keras.layers.Dense(units=output_shape,activation="softmax")
def _conv_block(self,filters,num_layers):
block = tf.keras.Sequential()
for _ in range(num_layers):
block.add(tf.keras.layers.Conv2D(filters=filters, kernel_size=(3,3), strides=(1,1), activation="relu",padding="same"))
block.add(tf.keras.layers.MaxPool2D(pool_size=(2,2),strides=(2,2)))
return block
def call(self,inputs,training=False):
x = inputs
for block in self.conv_blocks:
x = block(x)
x = self.flatten(x)
x = self.dense1(x)
x = self.dropout1(x,training=training)
x = self.dense2(x)
x= self.dropout2(x)
x = self.output_layer(x)
return x
From previous article we find out that the VGG model have different version based on how deep the network. The deepest version is VGG-19.
We construct the model from scratch using keras layers. In order to have flexibility, we build the model using class model. In this method we can repeat several blocks multiple times. This can be efficient especially building deeper and custom architecture.
VGG_model = VGG19((config["IMG_SIZE"],config["IMG_SIZE"],3),len(class_names))
VGG_model(tf.keras.layers.Input(shape=(config["IMG_SIZE"],config["IMG_SIZE"],3)))
VGG_model.summary()
After define the model, we build the model based on the configuration. From the model that we have been construct, the total parameters is 70,388,806 parameters. Which is very high number of parameters.
VGG_model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001),
loss=tf.keras.losses.CategoricalCrossentropy(),
metrics=["accuracy"])
Next, we compile our model based on optimizer, loss, and defining our metric to accuracy.
Custom Callback
early_stopping_cb = tf.keras.callbacks.EarlyStopping(patience=config["PATIENCE"])
checkpoint_cb = tf.keras.callbacks.ModelCheckpoint("model_cb.keras",save_best_only=True)
# Initialize wandb
run = wandb.init(
project="vgg19-intel-scratch",
config=config,
)
In order to monitore our model we will use several callback like earlystopping, checkpoint, tensorboard, and wandb. After we define it we will initialize our wandb project and sync it with tensorboard.
Training Model
history = VGG_model.fit(train_dataset,epochs=config["EPOCHS"],validation_data=valid_dataset,callbacks=[early_stopping_cb,checkpoint_cb,
WandbMetricsLogger(log_freq=10)])
To train the model, we will call method fit with adjusted configuration. The history variable we use to get the history of the training process.
Optionally you can use distributed learning if you have multiple GPU (check the notebook) to faster the training process
Evaluation
Evaluate on test dataset
loss, acc = VGG_model.evaluate(test_data)
print(loss,acc)
plt.plot(history.history["loss"],label="Train loss")
plt.plot(history.history["val_loss"],label="Val loss")
plt.xlabel("Epochs")
plt.ylabel("Accuracy")
plt.legend()
plt.show()
loss, acc = VGG_model.evaluate(test_data)
print(loss,acc)
Export The Label From Test Dataset
import numpy as np
y_pred = []
y_label = []
for images, labels in test_data:
batch_pred = tf.argmax(VGG_model(images, training=False), axis=1).numpy() # Get predictions
y_pred.extend(batch_pred)
batch_labels = tf.argmax(labels, axis=1).numpy()
y_label.extend(batch_labels)
y_pred = np.array(y_pred)
y_label = np.array(y_label)
table = wandb.Table(columns=["Image", "True Label", "Predicted Label"])
y_pred = []
y_label = []
all_images = []
for img_batch, labels in test_data:
batch_pred = tf.argmax(VGG_model(img_batch, training=False), axis=1).numpy()
y_pred.extend(batch_pred)
batch_labels = tf.argmax(labels, axis=1).numpy()
y_label.extend(batch_labels)
all_images.extend(img_batch.numpy())
y_pred = np.array(y_pred)
y_label = np.array(y_label)
all_images = np.array(all_images)
plt.figure(figsize=(10, 10))
for i in range(16):
ax = plt.subplot(4, 4, i + 1)
img = (all_images[i] * 255).astype("uint8")
plt.imshow(img)
plt.title(f"Actual: {class_names[y_label[i]]}\nPred: {class_names[y_pred[i]]}")
plt.axis(False)
plt.tight_layout()
plt.show()
for i in range(16):
table.add_data(
wandb.Image(all_images[i]),
class_names[y_label[i]],
class_names[y_pred[i]]
)
run.log({"predictions_table": table})
run.finish()
Evaluate Using Confusion Matrix
confusion_matrix = tf.math.confusion_matrix(
y_label,
y_pred,
num_classes=len(class_names)
)
print("Confusion Matrix:")
print(confusion_matrix.numpy())
plt.figure()
sns.heatmap(confusion_matrix.numpy(),annot=True,fmt="d",cmap="Blues")
plt.xlabel("Predicted Label")
plt.ylabel("Actual Label")
plt.show()
Evaluate using classification Report
from sklearn.metrics import classification_report
report = classification_report(y_label,y_pred,target_names=[f"Class {i}" for i in range(len(class_names))])
print("Classification Report")
print(report)
Summary
From the article we know that building VGG 19 from scratch is feasible but it still need improvement. You also found that training VGG-19 is computational exhausting rather than previous model (LeNet). This model also have very number of parameter. We will explore more about how to build model that efficient with high accuracy