Build and Understand LeNet 5 Architecture Using TensorFlow

Build and Understand LeNet 5 Architecture Using TensorFlow

Machine learning algorithm especially in computer vision is rapidly developed. In order to achieve the rapidly algorithm development we must learn continuously from the basic. One of the basic is the development by Yann Lecun is LeNet. We will try to explore the LeNet architecture in this blog

Introduction

LeNet is a series of convolutional neural network which used in hand written dataset as known as MNIST dataset. The idea of building the LeNet is to build CNN based architecture to be able to classify image in image classification task

LeNet-5

LeNet-5 is the latest development of the LeNet architecture. LeNet architecture seem are shallower than modern architecture because of lack of computational. In these day, we have more computational capability likes more CPU, high RAM, and rapidly development of GPU. So, at that time, the LeNet architecture are building by the computational constraint.

The architecture of LeNet 5 consist of Conv(Convolutional) layer and FC (Fully Connected )layer. The dataset is consist of image with the shape of 28 × 28 × 1 . The dataset also consist of 60,000 data training and 10.000 data testing which is sufficient for training model. There are severals step in this model:

  1. Input layer

    Contain the input size image which is 28×28×1

  2. Conv Layer 1

    The convolutional layer is used with 5 × 5 kernel with padding = 2 and consist of 6 filter

  3. Average Pooling

    To reduce the size of the feature map, pooling layer is used with the method is averaging the feature map. The pooling layer using stride = 2

  4. Consist of repeteadly convolutional and pooling layer

  5. Fully connected layer

    The FC layer consist of 3 layer which is layer with 120, 84, and 10 neuron. The last layer is 10 neuron because it used classify the handwritten digit 0-9

undefined

In order to gain more knowledge let build the LeNet-5 model using tensorflow

Working with Tensorflow

The code below is running in google colab

Installing Dependencies

!pip install tensorflow==2.15
!pip install -U tensorboard_plugin_profile
!pip install wandb

In order to work properly, I am using tensorflow 2.15 and tensorboard for evaluation performance. I also use wandb to track my model which is a good practice even with the simple model like LeNet 5 Model

Import Libraries

import tensorflow as tf
import matplotlib.pyplot as plt
import os, datetime
import wandb
from wandb.integration.keras import WandbMetricsLogger
print(tf.__version__)

From collecting data until visualizing the performance we need all the library above. Also to faster the model training, we can use GPU that provided in google colab. To check the GPU compatibility we can use code below

gpus = tf.config.list_physical_devices('GPU')
if gpus:
    try:
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
    except RuntimeError as e:
        print(e)

Also don’t forget to login to wandb

wandb.login(key="YOUR_KEY")

Importing Dataset

The mnist dataset already available in tensorflow.data.dataset module. In order to download it run the code below

dataset = tf.keras.datasets.mnist
(X_train,y_train), (X_test,y_test) = dataset.load_data()
print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)

The output of the will giving the shape of training and testing data which is 60,000 and 10,000.

Splitting Data and converting to TF.data

In tensorflow, the optimalization data pipeline is using tf.data which can faster the training. tf.data will optimize the memory usage and avoid bottleneck in training pipeline. In order to do this you can try code below

num_train_data = X_train.shape[0]
num_test_data = X_test.shape[0]
train_data = tf.data.Dataset.from_tensor_slices((X_train,y_train))
test_dataset = tf.data.Dataset.from_tensor_slices((X_test,y_test))
valid_data = test_dataset.take(int(0.5*num_test_data))
test_data = test_dataset.skip(int(0.5*num_test_data))

From this code, we first count the number of the data of each set (training,testing) in order to splitting the data. After that we will convert the data from numpy array to tf.data using method from_tensor_slices for each data. Then we split the validation and testing data to 50% and 50% using .skip and .take.

Data Distribution

In real world cases, the data distribution is often not distributed equally so we need to check whether the data distribution is balance or imbalance.


def distribution_class(train_dataset, test_dataset):
    labels = np.concatenate((train_dataset,test_dataset),axis=0)
    unique_labels, counts = np.unique(labels,return_counts=True)
    plt.bar(unique_labels, counts)
    plt.plot()
    plt.show()
    return unique_labels
unique_label = distribution_class(y_train, y_test)

This method will take the numpy array data from the beginning instead of using tf.data. why? because if we use the tf.data it is more slower because the data are fetched in batches instead of all in one. the output of the cell will look like this.

From the distribution we can see the data are balanced.

Data Visualization

Before passing the data, we would like to check the data to understand the problem and the behavior of the data. In our case is handwritten digit. To visualize it we can use code below.

for i, (img, label) in enumerate(train_data.take(16)):
    ax = plt.subplot(4,4,i+1)
    plt.imshow(img)
    plt.title(label.numpy())

In this code we take 16 data from training data then plotting it with plt. The output should like this.

The label is in above of each images and the label and data are correct

Configure Parameters

In this case we will only configure few parameter of the training pipeline. which is num_classes, batch_size,img_size, etc. The following code to make config parameter data

config = {
    "num_classes" : 10,
    "batch_size" : 64,
    "image_size" : 28,
    "image_channels" : 1,
    "buffer_size" : 10000,
    "learning_rate" : 1e-3,
    "epochs" : 200,
    "earlystopping_patience" : 3
}

Preprocessing Data

Several methods is used to preprocessing the data, which is normalizing, reshape, and one hot encoding. We need to normalize the data in order to correct the distribution of the data (it is image pixel). We also need to reshape the data in order to have same size of the image to given to model. One hot encoding needed to encode the label (0-9) to become [0,0,0,0,0,0,0,0,1] (represent label 9). with these we can assume the 10 last neuron to represent each class label.

def normalize(img):
  img = tf.cast(img,tf.float32)/255.0

  return img

def one_hot_label(labels,num_classes):
  label = tf.one_hot(labels,depth=num_classes)
  return label

def reshape_img(img,shape,channels):
  img = tf.expand_dims(img,axis=-1)
  img = tf.image.resize(img,(shape,shape))
  return img

def img_normalize_one_hot(img,label,config):
    img = reshape_img(img,config["image_size"],config["image_channels"])
    img = normalize(img)
    label = one_hot_label(label,config["num_classes"])
    return img,label

After building the preprocessing pipeline we can implement it in each set of our data.

normalized_train_dataset = (
    train_data
    .map(lambda img,label: img_normalize_one_hot(img,label,config),num_parallel_calls=tf.data.AUTOTUNE)
    .batch(config["batch_size"])
    .cache()
    .shuffle(buffer_size=config["buffer_size"])
    .prefetch(buffer_size=tf.data.AUTOTUNE)
)
normalized_valid_dataset = (
    test_data.map(lambda img,label : img_normalize_one_hot(img,label,config),num_parallel_calls=tf.data.AUTOTUNE)
    .batch(config["batch_size"])
    .cache()
    .shuffle(buffer_size=config["buffer_size"])
    .prefetch(buffer_size=tf.data.AUTOTUNE)
)
normalized_test_dataset = (valid_data.map(lambda img,label : img_normalize_one_hot(img,label,config),num_parallel_calls=tf.data.AUTOTUNE)
    .batch(config["batch_size"])
    .prefetch(buffer_size=tf.data.AUTOTUNE)
                                          )

These code seem complicated but it is realy simple. The each of our set is preprocessed based on previous pipeline. Then we add some method from tf.data to optimize our data to be consumed by the model which is batch, cache, shuffle, and prefetch.

Building Callback

In tensorflow we can use callback to customize our training process. Some callback we use in this training are early stopping, checkpoint, and tensorboard.

early_stopping_cb = tf.keras.callbacks.EarlyStopping(patience=config["earlystopping_patience"])
checkpoint_cb = tf.keras.callbacks.ModelCheckpoint("model_cb.keras",save_best_only=True)
logs = "logs/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_cb = tf.keras.callbacks.TensorBoard(log_dir = logs,
                                                 histogram_freq = 1,
                                                 profile_batch = '500,520')

With 3 different callback we can make sure our training is running without problem.

Initialize wandb

run = wandb.init(
    project = "lenet-5-scratch",
    config = config
)

We use wandb to explore our model performance. The project is project name that we use and config the the config of the training process.

Building Model from scratch

There are 3 way to define the model, first is by sequential api, functional api, and class api. In this case we use class api to understand how to building the model from scratch.

Based on given image (intro section) we know that the LeNet-5 consist of several layer. To better understanding I write in in the table

LayerConfigurationDescribtion
Input Layer28×28The first layer
conv layer 1filter = 6, kernel_size (5×5), padding=2First conv layer
Average Pooling layerstrides (2,2)First pooling layer
conv layer 2filter = 16, kernel_size (5×5)sec conv layer
Average Pooling layerstrides (2,2)sec pooling layer
Dense layer 1120 NeuronsFC 1 layer
Dense layer 284 NeuronsFC 2 layer
Output Layer10 NeuronsOutput Layer

From given table we can construct it in tensorflow as above

class lenet_5(tf.keras.Model):
    def  __init__(self,input_shape,outputs, **kwargs):
        super().__init__(**kwargs)
        self.conv1 = tf.keras.layers.Conv2D(filters=6,strides=(1,1),kernel_size=(5,5),activation="tanh")
        self.pool1 = tf.keras.layers.AveragePooling2D(pool_size=(2,2),strides=(2,2))
        self.conv2 = tf.keras.layers.Conv2D(filters=6,strides=(1,1),kernel_size=(5,5),activation="tanh")
        self.pool2 = tf.keras.layers.AveragePooling2D(pool_size=(2,2),strides=(2,2))
        self.flatten = tf.keras.layers.Flatten()
        self.dense1 = tf.keras.layers.Dense(units=84,activation="tanh")
        self.output_layer = tf.keras.layers.Dense(units=outputs,activation="softmax")

    def call(self,inputs):
        x = self.conv1(inputs)
        x = self.pool1(x)
        x = self.conv2(x)
        x = self.pool2(x)
        x = self.flatten(x)
        x = self.dense1(x)
        x = self.output_layer(x)
        return x

model = lenet_5((28,28,1),10)
model.compile(loss='categorical_crossentropy', optimizer=tf.keras.optimizers.SGD(learning_rate=config["learning_rate"]), metrics=['accuracy'])

We can see that the activation function is tanh which is rarely used today. But back to that day the tanh activation function is highly used

Training the model

In order to train the model we can simply run code below

history = model.fit(normalized_train_dataset,validation_data=normalized_valid_dataset,
                    epochs=config["epochs"],callbacks=[early_stopping_cb,checkpoint_cb,
                                          tensorboard_cb,WandbMetricsLogger(log_freq=10)])

The code below is using the config parameter that we described earlier

Evaluate Performance

After training the model, we need to check the performance of the model. First we can plot the loss and accuracy

plt.plot(history.history["accuracy"])
plt.plot(history.history["val_accuracy"])
plt.title("Accuracy")
plt.xlabel("Epochs")
plt.ylabel("Accuracy")

plt.plot(history.history["loss"])
plt.plot(history.history["val_loss"])
plt.xlabel("Epochs")
plt.ylabel("Accuracy")

Evaluate on Testing Dataset

Beside monitor the performance by reading from training and validation set, we need to evaluate based on test set.

model.evaluate(normalized_test_dataset)
# output [0.10710800439119339, 0.9684000015258789]

From the model evaluation the loss on testing set is 0.419 and the accuracy is 0.87. We can still improve it by tuning the parameter to train the model. Example by increasing the epochs.

Plot The Predicted Image

To gain more information about the performance, we can try to plot the prediction from testing set. First we need to take the prediction from model and the actual label from testing dataset.

# Generate predictions
y_pred_prob = model.predict(normalized_test_dataset)  # Predicted probabilities
y_pred = np.argmax(y_pred_prob, axis=1)  # Convert probabilities to class indices

# Extract true labels from dataset and convert to class indices
y_true = np.concatenate([np.argmax(label.numpy(), axis=1) for _, label in normalized_test_dataset])
print(f"y_true shape: {y_true.shape}, unique values: {np.unique(y_true)}")
print(f"y_pred shape: {y_pred.shape}, unique values: {np.unique(y_pred)}")

num_uniq,count = np.unique(y_pred,return_counts=True)
print(num_uniq,count)

num_uniq,count = np.unique(y_true,return_counts=True)
print(num_uniq,count)

# Validate shapes
print(f"y_pred shape: {y_pred.shape}, y_true shape: {y_true.shape}")

After getting the label, we will plot 16 images and compare from prediction and actual label. In this code also integrate wandb table to save the prediction images so we can store and evaluate on wandb

img_num = 0
IMG_PLOT = 16
plt.figure(figsize=(6,6))
table = wandb.Table(columns=["Image", "True Label", "Predicted Label"])
for img_batches, label in normalized_test_dataset.take(1):
  batches = img.shape[0]
  for b in range(batches):
    if img_num >= IMG_PLOT:
      break
    ax = plt.subplot(4,4,img_num+1)
    img = img_batches[b].numpy().squeeze()
    plt.imshow(img)
    plt.title(f"actual:{np.argmax(label[b].numpy())},pred :{y_pred[b]}",fontsize=6)
    img_num+=1
    table.add_data(wandb.Image(img), np.argmax(label[b].numpy()), y_pred[b])

run.log({"predictions_table": table})
run.finish()

Evaluate using Confusion Matrix

In classification task, confusion matrix need to compute to measure the performance of the model.

# Compute confusion matrix
confusion_matrix = tf.math.confusion_matrix(
    y_true,
    y_pred,
    num_classes=config["num_classes"]
)

# Display confusion matrix
print("Confusion Matrix:")
print(confusion_matrix.numpy())

To visualize the confusion matrix you can use seaborn and making the classification report from sklearn.

import seaborn as sns
plt.figure()
sns.heatmap(confusion_matrix.numpy(),annot=True,fmt="d",cmap="Blues")
plt.xlabel("Predicted Label")
plt.ylabel("Actual Label")
plt.show()

from sklearn.metrics import classification_report
report = classification_report(y_true,y_pred,target_names=[f"Class {i}" for i in range(config["num_classes"])])
print("Classification Report")
print(report)

Evaluation Using Tensorboard

Tensorflow have tensorboard to visualize and evaluate the training. It realy helpful to gain information about the model and how to achieve better performance. First we need to move the logs file

import pathlib
log_files = pathlib.Path('./logs').glob('*/*/events.out.tfevents*')
for file in log_files:
  !mv {file} {file.parent.parent}

Then use following code to run tensorboard

%load_ext tensorboard
%tensorboard --logdir=logs

Some information that can be use to measure performance using tensorboard are in following images

Feel free to explore and discover more about tensorboard

Evaluate on Wandb

Wandb is a good MLOps to monitore the model in the ML pipelines. However in this case we only train on small model which is not utilize the wandb efficiently but it is a good practice to use wandb. Some of the feature in wandb can be shown in images below.

Summary

In summary we have be able to building LeNet-5 from scratch as beginning for our journey in computer vision. LeNet-5 is one of the foundation for computer vision. But recently, many algorithm with higher computation and better accuracy have been developed. We will talk about it later. Also you may confuse of using tensorflow, tensorboard, and wandb. We will breakthrough this section more later.

Full code :

Code