Table of contents
- Introduction
- LeNet-5
- Working with Tensorflow
- Installing Dependencies
- Import Libraries
- Importing Dataset
- Splitting Data and converting to TF.data
- Data Distribution
- Data Visualization
- Configure Parameters
- Preprocessing Data
- Building Callback
- Initialize wandb
- Building Model from scratch
- Training the model
- Evaluate Performance
- Evaluate on Testing Dataset
- Plot The Predicted Image
- Evaluate using Confusion Matrix
- Evaluation Using Tensorboard
- Evaluate on Wandb
- Summary
Machine learning algorithm especially in computer vision is rapidly developed. In order to achieve the rapidly algorithm development we must learn continuously from the basic. One of the basic is the development by Yann Lecun is LeNet. We will try to explore the LeNet architecture in this blog
Introduction
LeNet is a series of convolutional neural network which used in hand written dataset as known as MNIST dataset. The idea of building the LeNet is to build CNN based architecture to be able to classify image in image classification task
LeNet-5
LeNet-5 is the latest development of the LeNet architecture. LeNet architecture seem are shallower than modern architecture because of lack of computational. In these day, we have more computational capability likes more CPU, high RAM, and rapidly development of GPU. So, at that time, the LeNet architecture are building by the computational constraint.
The architecture of LeNet 5 consist of Conv(Convolutional) layer and FC (Fully Connected )layer. The dataset is consist of image with the shape of 28 × 28 × 1 . The dataset also consist of 60,000 data training and 10.000 data testing which is sufficient for training model. There are severals step in this model:
Input layer
Contain the input size image which is 28×28×1
Conv Layer 1
The convolutional layer is used with 5 × 5 kernel with padding = 2 and consist of 6 filter
Average Pooling
To reduce the size of the feature map, pooling layer is used with the method is averaging the feature map. The pooling layer using stride = 2
Consist of repeteadly convolutional and pooling layer
Fully connected layer
The FC layer consist of 3 layer which is layer with 120, 84, and 10 neuron. The last layer is 10 neuron because it used classify the handwritten digit 0-9
In order to gain more knowledge let build the LeNet-5 model using tensorflow
Working with Tensorflow
The code below is running in google colab
Installing Dependencies
!pip install tensorflow==2.15
!pip install -U tensorboard_plugin_profile
!pip install wandb
In order to work properly, I am using tensorflow 2.15 and tensorboard for evaluation performance. I also use wandb to track my model which is a good practice even with the simple model like LeNet 5 Model
Import Libraries
import tensorflow as tf
import matplotlib.pyplot as plt
import os, datetime
import wandb
from wandb.integration.keras import WandbMetricsLogger
print(tf.__version__)
From collecting data until visualizing the performance we need all the library above. Also to faster the model training, we can use GPU that provided in google colab. To check the GPU compatibility we can use code below
gpus = tf.config.list_physical_devices('GPU')
if gpus:
try:
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
except RuntimeError as e:
print(e)
Also don’t forget to login to wandb
wandb.login(key="YOUR_KEY")
Importing Dataset
The mnist dataset already available in tensorflow.data.dataset module. In order to download it run the code below
dataset = tf.keras.datasets.mnist
(X_train,y_train), (X_test,y_test) = dataset.load_data()
print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)
The output of the will giving the shape of training and testing data which is 60,000 and 10,000.
Splitting Data and converting to TF.data
In tensorflow, the optimalization data pipeline is using tf.data which can faster the training. tf.data will optimize the memory usage and avoid bottleneck in training pipeline. In order to do this you can try code below
num_train_data = X_train.shape[0]
num_test_data = X_test.shape[0]
train_data = tf.data.Dataset.from_tensor_slices((X_train,y_train))
test_dataset = tf.data.Dataset.from_tensor_slices((X_test,y_test))
valid_data = test_dataset.take(int(0.5*num_test_data))
test_data = test_dataset.skip(int(0.5*num_test_data))
From this code, we first count the number of the data of each set (training,testing) in order to splitting the data. After that we will convert the data from numpy array to tf.data using method from_tensor_slices
for each data. Then we split the validation and testing data to 50% and 50% using .skip
and .take
.
Data Distribution
In real world cases, the data distribution is often not distributed equally so we need to check whether the data distribution is balance or imbalance.
def distribution_class(train_dataset, test_dataset):
labels = np.concatenate((train_dataset,test_dataset),axis=0)
unique_labels, counts = np.unique(labels,return_counts=True)
plt.bar(unique_labels, counts)
plt.plot()
plt.show()
return unique_labels
unique_label = distribution_class(y_train, y_test)
This method will take the numpy array data from the beginning instead of using tf.data. why? because if we use the tf.data it is more slower because the data are fetched in batches instead of all in one. the output of the cell will look like this.
From the distribution we can see the data are balanced.
Data Visualization
Before passing the data, we would like to check the data to understand the problem and the behavior of the data. In our case is handwritten digit. To visualize it we can use code below.
for i, (img, label) in enumerate(train_data.take(16)):
ax = plt.subplot(4,4,i+1)
plt.imshow(img)
plt.title(label.numpy())
In this code we take 16 data from training data then plotting it with plt. The output should like this.
The label is in above of each images and the label and data are correct
Configure Parameters
In this case we will only configure few parameter of the training pipeline. which is num_classes, batch_size,img_size, etc. The following code to make config parameter data
config = {
"num_classes" : 10,
"batch_size" : 64,
"image_size" : 28,
"image_channels" : 1,
"buffer_size" : 10000,
"learning_rate" : 1e-3,
"epochs" : 200,
"earlystopping_patience" : 3
}
Preprocessing Data
Several methods is used to preprocessing the data, which is normalizing, reshape, and one hot encoding. We need to normalize the data in order to correct the distribution of the data (it is image pixel). We also need to reshape the data in order to have same size of the image to given to model. One hot encoding needed to encode the label (0-9) to become [0,0,0,0,0,0,0,0,1] (represent label 9). with these we can assume the 10 last neuron to represent each class label.
def normalize(img):
img = tf.cast(img,tf.float32)/255.0
return img
def one_hot_label(labels,num_classes):
label = tf.one_hot(labels,depth=num_classes)
return label
def reshape_img(img,shape,channels):
img = tf.expand_dims(img,axis=-1)
img = tf.image.resize(img,(shape,shape))
return img
def img_normalize_one_hot(img,label,config):
img = reshape_img(img,config["image_size"],config["image_channels"])
img = normalize(img)
label = one_hot_label(label,config["num_classes"])
return img,label
After building the preprocessing pipeline we can implement it in each set of our data.
normalized_train_dataset = (
train_data
.map(lambda img,label: img_normalize_one_hot(img,label,config),num_parallel_calls=tf.data.AUTOTUNE)
.batch(config["batch_size"])
.cache()
.shuffle(buffer_size=config["buffer_size"])
.prefetch(buffer_size=tf.data.AUTOTUNE)
)
normalized_valid_dataset = (
test_data.map(lambda img,label : img_normalize_one_hot(img,label,config),num_parallel_calls=tf.data.AUTOTUNE)
.batch(config["batch_size"])
.cache()
.shuffle(buffer_size=config["buffer_size"])
.prefetch(buffer_size=tf.data.AUTOTUNE)
)
normalized_test_dataset = (valid_data.map(lambda img,label : img_normalize_one_hot(img,label,config),num_parallel_calls=tf.data.AUTOTUNE)
.batch(config["batch_size"])
.prefetch(buffer_size=tf.data.AUTOTUNE)
)
These code seem complicated but it is realy simple. The each of our set is preprocessed based on previous pipeline. Then we add some method from tf.data to optimize our data to be consumed by the model which is batch, cache, shuffle, and prefetch.
Building Callback
In tensorflow we can use callback to customize our training process. Some callback we use in this training are early stopping, checkpoint, and tensorboard.
early_stopping_cb = tf.keras.callbacks.EarlyStopping(patience=config["earlystopping_patience"])
checkpoint_cb = tf.keras.callbacks.ModelCheckpoint("model_cb.keras",save_best_only=True)
logs = "logs/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_cb = tf.keras.callbacks.TensorBoard(log_dir = logs,
histogram_freq = 1,
profile_batch = '500,520')
With 3 different callback we can make sure our training is running without problem.
Initialize wandb
run = wandb.init(
project = "lenet-5-scratch",
config = config
)
We use wandb to explore our model performance. The project is project name that we use and config the the config of the training process.
Building Model from scratch
There are 3 way to define the model, first is by sequential api, functional api, and class api. In this case we use class api to understand how to building the model from scratch.
Based on given image (intro section) we know that the LeNet-5 consist of several layer. To better understanding I write in in the table
Layer | Configuration | Describtion |
Input Layer | 28×28 | The first layer |
conv layer 1 | filter = 6, kernel_size (5×5), padding=2 | First conv layer |
Average Pooling layer | strides (2,2) | First pooling layer |
conv layer 2 | filter = 16, kernel_size (5×5) | sec conv layer |
Average Pooling layer | strides (2,2) | sec pooling layer |
Dense layer 1 | 120 Neurons | FC 1 layer |
Dense layer 2 | 84 Neurons | FC 2 layer |
Output Layer | 10 Neurons | Output Layer |
From given table we can construct it in tensorflow as above
class lenet_5(tf.keras.Model):
def __init__(self,input_shape,outputs, **kwargs):
super().__init__(**kwargs)
self.conv1 = tf.keras.layers.Conv2D(filters=6,strides=(1,1),kernel_size=(5,5),activation="tanh")
self.pool1 = tf.keras.layers.AveragePooling2D(pool_size=(2,2),strides=(2,2))
self.conv2 = tf.keras.layers.Conv2D(filters=6,strides=(1,1),kernel_size=(5,5),activation="tanh")
self.pool2 = tf.keras.layers.AveragePooling2D(pool_size=(2,2),strides=(2,2))
self.flatten = tf.keras.layers.Flatten()
self.dense1 = tf.keras.layers.Dense(units=84,activation="tanh")
self.output_layer = tf.keras.layers.Dense(units=outputs,activation="softmax")
def call(self,inputs):
x = self.conv1(inputs)
x = self.pool1(x)
x = self.conv2(x)
x = self.pool2(x)
x = self.flatten(x)
x = self.dense1(x)
x = self.output_layer(x)
return x
model = lenet_5((28,28,1),10)
model.compile(loss='categorical_crossentropy', optimizer=tf.keras.optimizers.SGD(learning_rate=config["learning_rate"]), metrics=['accuracy'])
We can see that the activation function is tanh which is rarely used today. But back to that day the tanh activation function is highly used
Training the model
In order to train the model we can simply run code below
history = model.fit(normalized_train_dataset,validation_data=normalized_valid_dataset,
epochs=config["epochs"],callbacks=[early_stopping_cb,checkpoint_cb,
tensorboard_cb,WandbMetricsLogger(log_freq=10)])
The code below is using the config parameter that we described earlier
Evaluate Performance
After training the model, we need to check the performance of the model. First we can plot the loss and accuracy
plt.plot(history.history["accuracy"])
plt.plot(history.history["val_accuracy"])
plt.title("Accuracy")
plt.xlabel("Epochs")
plt.ylabel("Accuracy")
plt.plot(history.history["loss"])
plt.plot(history.history["val_loss"])
plt.xlabel("Epochs")
plt.ylabel("Accuracy")
Evaluate on Testing Dataset
Beside monitor the performance by reading from training and validation set, we need to evaluate based on test set.
model.evaluate(normalized_test_dataset)
# output [0.10710800439119339, 0.9684000015258789]
From the model evaluation the loss on testing set is 0.419 and the accuracy is 0.87. We can still improve it by tuning the parameter to train the model. Example by increasing the epochs.
Plot The Predicted Image
To gain more information about the performance, we can try to plot the prediction from testing set. First we need to take the prediction from model and the actual label from testing dataset.
# Generate predictions
y_pred_prob = model.predict(normalized_test_dataset) # Predicted probabilities
y_pred = np.argmax(y_pred_prob, axis=1) # Convert probabilities to class indices
# Extract true labels from dataset and convert to class indices
y_true = np.concatenate([np.argmax(label.numpy(), axis=1) for _, label in normalized_test_dataset])
print(f"y_true shape: {y_true.shape}, unique values: {np.unique(y_true)}")
print(f"y_pred shape: {y_pred.shape}, unique values: {np.unique(y_pred)}")
num_uniq,count = np.unique(y_pred,return_counts=True)
print(num_uniq,count)
num_uniq,count = np.unique(y_true,return_counts=True)
print(num_uniq,count)
# Validate shapes
print(f"y_pred shape: {y_pred.shape}, y_true shape: {y_true.shape}")
After getting the label, we will plot 16 images and compare from prediction and actual label. In this code also integrate wandb table to save the prediction images so we can store and evaluate on wandb
img_num = 0
IMG_PLOT = 16
plt.figure(figsize=(6,6))
table = wandb.Table(columns=["Image", "True Label", "Predicted Label"])
for img_batches, label in normalized_test_dataset.take(1):
batches = img.shape[0]
for b in range(batches):
if img_num >= IMG_PLOT:
break
ax = plt.subplot(4,4,img_num+1)
img = img_batches[b].numpy().squeeze()
plt.imshow(img)
plt.title(f"actual:{np.argmax(label[b].numpy())},pred :{y_pred[b]}",fontsize=6)
img_num+=1
table.add_data(wandb.Image(img), np.argmax(label[b].numpy()), y_pred[b])
run.log({"predictions_table": table})
run.finish()
Evaluate using Confusion Matrix
In classification task, confusion matrix need to compute to measure the performance of the model.
# Compute confusion matrix
confusion_matrix = tf.math.confusion_matrix(
y_true,
y_pred,
num_classes=config["num_classes"]
)
# Display confusion matrix
print("Confusion Matrix:")
print(confusion_matrix.numpy())
To visualize the confusion matrix you can use seaborn and making the classification report from sklearn.
import seaborn as sns
plt.figure()
sns.heatmap(confusion_matrix.numpy(),annot=True,fmt="d",cmap="Blues")
plt.xlabel("Predicted Label")
plt.ylabel("Actual Label")
plt.show()
from sklearn.metrics import classification_report
report = classification_report(y_true,y_pred,target_names=[f"Class {i}" for i in range(config["num_classes"])])
print("Classification Report")
print(report)
Evaluation Using Tensorboard
Tensorflow have tensorboard to visualize and evaluate the training. It realy helpful to gain information about the model and how to achieve better performance. First we need to move the logs file
import pathlib
log_files = pathlib.Path('./logs').glob('*/*/events.out.tfevents*')
for file in log_files:
!mv {file} {file.parent.parent}
Then use following code to run tensorboard
%load_ext tensorboard
%tensorboard --logdir=logs
Some information that can be use to measure performance using tensorboard are in following images
Feel free to explore and discover more about tensorboard
Evaluate on Wandb
Wandb is a good MLOps to monitore the model in the ML pipelines. However in this case we only train on small model which is not utilize the wandb efficiently but it is a good practice to use wandb. Some of the feature in wandb can be shown in images below.
Summary
In summary we have be able to building LeNet-5 from scratch as beginning for our journey in computer vision. LeNet-5 is one of the foundation for computer vision. But recently, many algorithm with higher computation and better accuracy have been developed. We will talk about it later. Also you may confuse of using tensorflow, tensorboard, and wandb. We will breakthrough this section more later.
Full code :