This Notebook is created by Moussa Jamor

Notebook link (Github): AutoEncoders in Deep Learning

AutoEncoders Architecture In DeepLearning¶

1. What is AutoEncoders ?¶

The AutoEncoders are special type of neural networks used for unsupervised learning. They composed by two main components, the Encoder and the Decoder, which both are neural networks architecture. In this notebook, you will have everything need to know about AutoEncoders, including the theory as well as build a AutoEncoder model using PyTorch, the dataset we'll use is MNIST dataset. As well as, see What's some AutoEncoders's applications.

No description has been provided for this image

First, the AutoEncoders proposed by G. E. Hinton and R. R. Salakhutdinov in paper titled Reducing the Dimensionality of Data with Neural Networks. They proposed the AutoEncoders as Non-Linear generatisation of PCA, dimentionality reduction cases. But, AutoEncoders has widely used in other applications, Transfer Leaning, Generative Models, Anomaly Detection and more.

2. Prepare the MNIST Dataset¶

Let's build and train our first AutoEncoders model. So, this section we will define the model's architecture, then we'll use the well known MNIST dataset, it's available through the pytorchvision library, for further information see the torchvision MNIST documentation.

Let's start by importing necessary libraries. Let's begin with PyTorch 🔥.

In [166]:

Copied!





import torch, torch.nn as nn
import torch.optim as optim 
from torch.utils.data import DataLoader, random_split, Subset

import torchvision
import torchvision.datasets as datasets
import torchvision.transforms as transforms

import matplotlib.pyplot as plt
import numpy as np

import seaborn as sns

from sklearn.metrics import classification_report
import torch, torch.nn as nn
import torch.optim as optim 
from torch.utils.data import DataLoader, random_split, Subset

import torchvision
import torchvision.datasets as datasets
import torchvision.transforms as transforms

import matplotlib.pyplot as plt
import numpy as np

import seaborn as sns

from sklearn.metrics import classification_report

Define some Helping Functions¶

In this section, we define some helping function, that we're going to use in coming sections.

In [4]:

Copied!

# define function to plot handwrite digits images 

def imshow_image(tensor: torch.Tensor) -> None:
    plt.imshow(tensor.detach().view(28, 28).numpy())
# define function to plot handwrite digits images 

def imshow_image(tensor: torch.Tensor) -> None:
    plt.imshow(tensor.detach().view(28, 28).numpy())

In [5]:

Copied!





def plot_subplots(
    images: torch.Tensor,
) -> None:

    max_cols = 8

    fig, axes = plt.subplots(1, max_cols, figsize=(18, 2))

    num_cols = images.shape[0]

    img = images.detach().numpy()

    for col in range(num_cols):
        if col >= max_cols: break
        axes[col].imshow(img[1 + col])
    plt.show()
def plot_subplots(
    images: torch.Tensor,
) -> None:

    max_cols = 8

    fig, axes = plt.subplots(1, max_cols, figsize=(18, 2))

    num_cols = images.shape[0]

    img = images.detach().numpy()

    for col in range(num_cols):
        if col >= max_cols: break
        axes[col].imshow(img[1 + col])
    plt.show()

Load MNSIT Dataset¶

To download and load the MNSIT dataset, we use the built-in MNSIT class from torchvision.datasets. Then sepecify some transformations by converting images to Tensor, using transforms.ToTensor(), each image is matrix of 28x28, the nn.Flatten transform this matrix to vector with 728 items.

In [6]:

Copied!

!mkdir mnist
!mkdir mnist

mkdir: cannot create directory ‘mnist’: File exists

In [7]:

Copied!





mnist = datasets.MNIST('./mnist/', download=True, 
                      transform= transforms.Compose([
                          transforms.ToTensor(),
                          nn.Flatten()
                      ])
        )
mnist = datasets.MNIST('./mnist/', download=True, 
                      transform= transforms.Compose([
                          transforms.ToTensor(),
                          nn.Flatten()
                      ])
        )

After the MNSIT dataset is downloaded and loaded successflly, we can viusalize using one of helping functions imshow_image, which we define at the beginning.

In [8]:

Copied!

imshow_image(mnist[0][0])
imshow_image(mnist[0][0])

Split MNIST Dataset¶

The sections we split the MNIST dataset, into a train with 50000 images, and test with 10000 images.

In [9]:

Copied!

train_data, test_data = random_split(mnist, [50000, 10000])
train_data, test_data = random_split(mnist, [50000, 10000])

To train our model, we have to create a DataLoader for each dataset train/test. with batch_size = 25 for training. For test dataset, we use the whole dataset.

In [10]:

Copied!

batch_size = 25

train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=10000, shuffle=True)
batch_size = 25

train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=10000, shuffle=True)

In [11]:

Copied!

input, _ = next(iter(train_loader))
input, _ = next(iter(train_loader))

After creating the DataLoader, let's plot the 8 first hand-writing digits, using the plot_subplots function.

In [12]:

Copied!

plot_subplots(input.squeeze(1).view(-1, 28, 28))
plot_subplots(input.squeeze(1).view(-1, 28, 28))

3. Build The AutoEncoders Model¶

This part is most exciting section, we're going to build our first AutoEncoder Model with PyTorch 🔥. As explained in the previous parts, That the AutoEncoders have two main components and building blocks. which are the Encoder and the Decoder component.

Note:

There is no standard way to build a create a AutoEncoders architecture. Which means, we can use Vanilla MLP, ConvLayers, RNN ..., etc.

in our case we'll use CNN, because it performs better for images.

Let's start with the Encoder. Are exciting, offcourse you are 😊.

Define The Encoder Model¶

Finally, we create a Encoder class, that extends for nn.Module, we create two main sub-models. The conv_encoder which is features extractor. Then, we fit the result to a linear_encoder that return the Embeddings representation of the input. At the first place, the model is initilized randomly but, it will be better and better during the training phase.

In [13]:

Copied!





class Encoder(nn.Module):

    def __init__(self, input_size: int = 28*28, embeddings_size = 10) -> None:
        super(Encoder, self).__init__()

        self.conv_encoder = nn.Sequential(
            nn.Conv2d(1, 10, 3),
            nn.Conv2d(10, 8, 3),
            nn.ReLU(),
            nn.Conv2d(8, 4, 7),
            nn.Flatten(1, -1),
        )

        self.linear_encoder = nn.Sequential(
            nn.Linear(4 * 18 * 18, 128),
            nn.Linear(128, 32),
            nn.ReLU(),
            nn.Linear(32, embeddings_size)
        )

    def forward(self, input: torch.Tensor) -> torch.Tensor:
        out = input.view(-1, 1, 28, 28)
        out = self.conv_encoder(out)
        return self.linear_encoder(out)
class Encoder(nn.Module):

    def __init__(self, input_size: int = 28*28, embeddings_size = 10) -> None:
        super(Encoder, self).__init__()

        self.conv_encoder = nn.Sequential(
            nn.Conv2d(1, 10, 3),
            nn.Conv2d(10, 8, 3),
            nn.ReLU(),
            nn.Conv2d(8, 4, 7),
            nn.Flatten(1, -1),
        )

        self.linear_encoder = nn.Sequential(
            nn.Linear(4 * 18 * 18, 128),
            nn.Linear(128, 32),
            nn.ReLU(),
            nn.Linear(32, embeddings_size)
        )

    def forward(self, input: torch.Tensor) -> torch.Tensor:
        out = input.view(-1, 1, 28, 28)
        out = self.conv_encoder(out)
        return self.linear_encoder(out)

Let's create an instance of Encoder class, to see how work if we fit it, with an images as input.

In [14]:

Copied!

encoder = Encoder()
encoder = Encoder()

In [15]:

Copied!

input, _ = next(iter(train_loader))
input, _ = next(iter(train_loader))

The following code give the dense vector representation, for first images. Let's create the Decoder class.

In [16]:

Copied!

encoder(input)[0]
encoder(input)[0]

Out[16]:

tensor([ 0.0258,  0.1940, -0.1093,  0.0069,  0.0537,  0.1174, -0.0988,  0.1132,
         0.1323,  0.1360], grad_fn=<SelectBackward0>)

Define The Decoder Model¶

The Decoder class is created by extendings as well from nn.Module. the Decoder is simple the inverse architecture of the Encoder. For Convolution part, we use the nn.ConvTranspose2d to perform the inverse operation of nn.Conv2d.

In [17]:

Copied!





class Decoder(nn.Module):
    def __init__(self, input_size: int = 10, embeddings_size=10):
        super(Decoder, self).__init__()
        
        self.linear_decoder = nn.Sequential(
            nn.Linear(embeddings_size, 32),
            nn.ReLU(),
            nn.Linear(32, 128),
            nn.ReLU(),
            nn.Linear(128, 4 * 18 * 18),
            nn.Unflatten(1, (4, 18, 18))
        )

        self.conv_decoder = nn.Sequential(
            nn.ConvTranspose2d(4, 8, 7),
            nn.ReLU(),
            nn.ConvTranspose2d(8, 10, 3),
            nn.ReLU(),
            nn.ConvTranspose2d(10, 1, 3),
        )

    def forward(self, input: torch.Tensor) -> torch.Tensor:
        out = self.linear_decoder(input)
        return self.conv_decoder(out.view(-1, 4, 18, 18))
class Decoder(nn.Module):
    def __init__(self, input_size: int = 10, embeddings_size=10):
        super(Decoder, self).__init__()
        
        self.linear_decoder = nn.Sequential(
            nn.Linear(embeddings_size, 32),
            nn.ReLU(),
            nn.Linear(32, 128),
            nn.ReLU(),
            nn.Linear(128, 4 * 18 * 18),
            nn.Unflatten(1, (4, 18, 18))
        )

        self.conv_decoder = nn.Sequential(
            nn.ConvTranspose2d(4, 8, 7),
            nn.ReLU(),
            nn.ConvTranspose2d(8, 10, 3),
            nn.ReLU(),
            nn.ConvTranspose2d(10, 1, 3),
        )

    def forward(self, input: torch.Tensor) -> torch.Tensor:
        out = self.linear_decoder(input)
        return self.conv_decoder(out.view(-1, 4, 18, 18))

Let's create a instance of Decoder class, then fit the model with random vector to construct an image.

In [18]:

Copied!

decoder = Decoder()
decoder = Decoder()

In [19]:

Copied!

img = decoder(torch.randn(1, 10))
img = decoder(torch.randn(1, 10))

In [20]:

Copied!

imshow_image(img)
imshow_image(img)

4. Build The AutoEncoder Model:¶

To create a AutoEncoders model, we compose the Encoder and the Decoder class.

Finally, we dit it.

In [21]:

Copied!





class AutoEncoders(nn.Module):

    def __init__(self, input_size=28*28, embeddings_size=100) -> None:
        super(AutoEncoders, self).__init__()

        self.encoder = Encoder(input_size, embeddings_size)
        self.decoder = Decoder(embeddings_size=embeddings_size)

    def forward(self, input: torch.Tensor) -> torch.Tensor:
        out = self.encoder(input)
        return self.decoder(out)
class AutoEncoders(nn.Module):

    def __init__(self, input_size=28*28, embeddings_size=100) -> None:
        super(AutoEncoders, self).__init__()

        self.encoder = Encoder(input_size, embeddings_size)
        self.decoder = Decoder(embeddings_size=embeddings_size)

    def forward(self, input: torch.Tensor) -> torch.Tensor:
        out = self.encoder(input)
        return self.decoder(out)

Let's create a instance of AutoEncoders model, and see the whole architecture.

In [22]:

Copied!

model = AutoEncoders()
model = AutoEncoders()

In [23]:

Copied!

model
model

Out[23]:

AutoEncoders(
  (encoder): Encoder(
    (conv_encoder): Sequential(
      (0): Conv2d(1, 10, kernel_size=(3, 3), stride=(1, 1))
      (1): Conv2d(10, 8, kernel_size=(3, 3), stride=(1, 1))
      (2): ReLU()
      (3): Conv2d(8, 4, kernel_size=(7, 7), stride=(1, 1))
      (4): Flatten(start_dim=1, end_dim=-1)
    )
    (linear_encoder): Sequential(
      (0): Linear(in_features=1296, out_features=128, bias=True)
      (1): Linear(in_features=128, out_features=32, bias=True)
      (2): ReLU()
      (3): Linear(in_features=32, out_features=100, bias=True)
    )
  )
  (decoder): Decoder(
    (linear_decoder): Sequential(
      (0): Linear(in_features=100, out_features=32, bias=True)
      (1): ReLU()
      (2): Linear(in_features=32, out_features=128, bias=True)
      (3): ReLU()
      (4): Linear(in_features=128, out_features=1296, bias=True)
      (5): Unflatten(dim=1, unflattened_size=(4, 18, 18))
    )
    (conv_decoder): Sequential(
      (0): ConvTranspose2d(4, 8, kernel_size=(7, 7), stride=(1, 1))
      (1): ReLU()
      (2): ConvTranspose2d(8, 10, kernel_size=(3, 3), stride=(1, 1))
      (3): ReLU()
      (4): ConvTranspose2d(10, 1, kernel_size=(3, 3), stride=(1, 1))
    )
  )
)

In [24]:

Copied!

img = model(mnist[0][0])
img = model(mnist[0][0])

In [25]:

Copied!

img.shape
img.shape

Out[25]:

torch.Size([1, 1, 28, 28])

In [26]:

Copied!

imshow_image(img)
imshow_image(img)

5. Train Model¶

Now, it's the time for trainnig.

Define Loss and Optimizer¶

Since, the AutoEncoders model, try to constuct the fitted image as input. So, let's consider as regression problem. that the model try to predict each pixel. Thus, nn.MSELoss is used as loss function to the model.

In [27]:

Copied!

criterion = nn.MSELoss()
criterion = nn.MSELoss()

In [28]:

Copied!





# define model parameters to be updated during the back-probagation 

params_to_optimize = [
    {'params': model.encoder.parameters()},
    {'params': model.decoder.parameters()}
]
# define model parameters to be updated during the back-probagation 

params_to_optimize = [
    {'params': model.encoder.parameters()},
    {'params': model.decoder.parameters()}
]

A optim.Adam optimizer used with a learning rate of 0.001.

In [29]:

Copied!

opt = optim.Adam(params_to_optimize,lr=0.001)
opt = optim.Adam(params_to_optimize,lr=0.001)

Let's Training The Model¶

In [30]:

Copied!





# This function is define to train the model, with MSELoss, and Adam optimizer.

def train(
    model,
    criterion,
    optimizer,
    train_loader,
    epochs=1,
    loggings: bool = True, 
    loggings_iter: int = 400,
) -> None:
    
    model.train()
    
    for epoch in range(epochs):
        
        for i, (img, _) in enumerate(train_loader):
        
            optimizer.zero_grad()
        
            gen_img = model(img)
            
            loss = criterion(gen_img.flatten(2,-1), img)
            
            loss.backward()
            
            optimizer.step()

            if i%int(loggings_iter)==0:
                print(f"Epochs: {epoch:4d} | Iteration: {i:4d}| Loss: {loss.item():4.7f}")
                print("-"*140)
                plot_subplots(img.squeeze(1).view(-1, 28, 28))
                plot_subplots(gen_img.squeeze(1).view(-1, 28, 28))

    print("-"*140)
    print("Traning is finished :) ")
# This function is define to train the model, with MSELoss, and Adam optimizer.

def train(
    model,
    criterion,
    optimizer,
    train_loader,
    epochs=1,
    loggings: bool = True, 
    loggings_iter: int = 400,
) -> None:
    
    model.train()
    
    for epoch in range(epochs):
        
        for i, (img, _) in enumerate(train_loader):
        
            optimizer.zero_grad()
        
            gen_img = model(img)
            
            loss = criterion(gen_img.flatten(2,-1), img)
            
            loss.backward()
            
            optimizer.step()

            if i%int(loggings_iter)==0:
                print(f"Epochs: {epoch:4d} | Iteration: {i:4d}| Loss: {loss.item():4.7f}")
                print("-"*140)
                plot_subplots(img.squeeze(1).view(-1, 28, 28))
                plot_subplots(gen_img.squeeze(1).view(-1, 28, 28))

    print("-"*140)
    print("Traning is finished :) ")

In [46]:

Copied!





# Let's start trainig, enjoy the show :).

train(
    model, 
    criterion, 
    optimizer=opt,
    train_loader=train_loader,
    epochs=2
)
# Let's start trainig, enjoy the show :).

train(
    model, 
    criterion, 
    optimizer=opt,
    train_loader=train_loader,
    epochs=2
)

Epochs:    0 | Iteration:    0| Loss: 0.1339402
--------------------------------------------------------------------------------------------------------------------------------------------

Epochs:    0 | Iteration:  400| Loss: 0.0267787
--------------------------------------------------------------------------------------------------------------------------------------------

Epochs:    0 | Iteration:  800| Loss: 0.0217806
--------------------------------------------------------------------------------------------------------------------------------------------

Epochs:    0 | Iteration: 1200| Loss: 0.0183262
--------------------------------------------------------------------------------------------------------------------------------------------

Epochs:    0 | Iteration: 1600| Loss: 0.0168445
--------------------------------------------------------------------------------------------------------------------------------------------

Epochs:    1 | Iteration:    0| Loss: 0.0182455
--------------------------------------------------------------------------------------------------------------------------------------------

Epochs:    1 | Iteration:  400| Loss: 0.0134456
--------------------------------------------------------------------------------------------------------------------------------------------

Epochs:    1 | Iteration:  800| Loss: 0.0134334
--------------------------------------------------------------------------------------------------------------------------------------------

Epochs:    1 | Iteration: 1200| Loss: 0.0128348
--------------------------------------------------------------------------------------------------------------------------------------------

Epochs:    1 | Iteration: 1600| Loss: 0.0105310
--------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------------------------------------------------------------
Traning is finished :)

The trainnig is finished, very nice. We can notice that model starts with noise images, but only after 400 iterations, we already indentify digits.

6. Save Models AutoEncoder, Encoder, Decoder¶

After the model trained, it's best practice to save out model, for safety reasons, ex. avoiding kernel restart in jupyter notebook and loss the model, let's save the whole model, then each components alone.

Why? the reason, is after training, we can split the encoder from the decoder parts and each components can be used in a different task. see the Applications section for further information.

In [31]:

Copied!

# torch.save(model.state_dict(), "./models/AutoEncoders_V1.pth")
# torch.save(model.state_dict(), "./models/AutoEncoders_V1.pth")

In [32]:

Copied!

# torch.save(model.encoder.state_dict(), "./models/Encoder_V1.pth")
# torch.save(model.encoder.state_dict(), "./models/Encoder_V1.pth")

In [33]:

Copied!

# torch.save(model.decoder.state_dict(), "./models/Decoder_V1.pth")
# torch.save(model.decoder.state_dict(), "./models/Decoder_V1.pth")

Let's Load the Model¶

In [34]:

Copied!

model = AutoEncoders()
model = AutoEncoders()

In [35]:

Copied!

model.load_state_dict(torch.load("./models/AutoEncoders_V1.pth"))
model.load_state_dict(torch.load("./models/AutoEncoders_V1.pth"))

Out[35]:

<All keys matched successfully>

In [36]:

Copied!

model
model

Out[36]:

AutoEncoders(
  (encoder): Encoder(
    (conv_encoder): Sequential(
      (0): Conv2d(1, 10, kernel_size=(3, 3), stride=(1, 1))
      (1): Conv2d(10, 8, kernel_size=(3, 3), stride=(1, 1))
      (2): ReLU()
      (3): Conv2d(8, 4, kernel_size=(7, 7), stride=(1, 1))
      (4): Flatten(start_dim=1, end_dim=-1)
    )
    (linear_encoder): Sequential(
      (0): Linear(in_features=1296, out_features=128, bias=True)
      (1): Linear(in_features=128, out_features=32, bias=True)
      (2): ReLU()
      (3): Linear(in_features=32, out_features=100, bias=True)
    )
  )
  (decoder): Decoder(
    (linear_decoder): Sequential(
      (0): Linear(in_features=100, out_features=32, bias=True)
      (1): ReLU()
      (2): Linear(in_features=32, out_features=128, bias=True)
      (3): ReLU()
      (4): Linear(in_features=128, out_features=1296, bias=True)
      (5): Unflatten(dim=1, unflattened_size=(4, 18, 18))
    )
    (conv_decoder): Sequential(
      (0): ConvTranspose2d(4, 8, kernel_size=(7, 7), stride=(1, 1))
      (1): ReLU()
      (2): ConvTranspose2d(8, 10, kernel_size=(3, 3), stride=(1, 1))
      (3): ReLU()
      (4): ConvTranspose2d(10, 1, kernel_size=(3, 3), stride=(1, 1))
    )
  )
)

7. Applicattions of AutoEncoders¶

In fact, After the AutoEncoders model is trained. We can use the Encoder network to get a vectors representation of a given input, the Decoder try to construct the data, the following represents some use case of AutoEncoders:

Dimensionality Reduction: see the following section, the AutoEncoders are considered as Non-Linear generalisation of PCA.
Classification/Regression Cases: The Encoder is also considered as feature-extractor, means during the back-propagation the most important features/patterns. So, we can use it as Transfer Learning, which means we can connect to a classifier or regressor to perform a Reression or classification for specific problem.
Compute Similarity: We can fit two samples, to the Encoder and get their representations, which are vectors, then compute a Cosine metric to measure similarity between these inputs.
Generating new instances/ Data Augmentation: After the model is trained, we can use Decoder to generate new instances for us, a use case if you have not must data using this techniques, is powerful becuase it learn the distribution during the training.

Dimensionality Reduction¶

The AutoEncoder arhitecture was first proposed as Non-Linear generatisation of PCA in the paper, titled Reducing the Dimensionality of Data with Neural Networks. As we see in previous sections, that AutoEncoders comes with two networks, the Encoder and the Decoder network. The Encoder tends to learn the features and patters from input data, it try to transform the hight-dimension data to low-dimentional space called Embeddings denoted by $Z$. In other hand, the Decoder network, tends to restruct the data given the Embeddings $Z$.

AutoEncoders Vs PCA¶

In the article An Introduction to Autoencoders written by Umberto Michelucci, in the dimensionality reduction section, he compares autoencoders with PCA:

Autoencoders can deal with a very large amount of data very efficiently since their training can be done with mini-batches, while PCA operates on the entire dataset. This can be an issue when the dataset is very large.
PCA provides a linear transformation, which may not capture non-linear relationships well. However, autoencoders are more flexible; by using activation functions, we can capture these kinds of information.

MNIST Classification¶

In [178]:

Copied!





class MnistClassifier(nn.Module):

    def __init__(self) -> None:
        super(MnistClassifier, self).__init__()

        _base_model = AutoEncoders()
        _base_model.load_state_dict(torch.load("./models/AutoEncoders_V1.pth"))

        self.encoder = _base_model.encoder

        self.classifier = nn.Sequential(
            nn.Linear(100, 32),
            nn.ReLU(),
            nn.Linear(32, 10)
        )

    def forward(self, input: torch.Tensor) -> torch.Tensor:
        out = self.encoder(input)
        return self.classifier(out)
class MnistClassifier(nn.Module):

    def __init__(self) -> None:
        super(MnistClassifier, self).__init__()

        _base_model = AutoEncoders()
        _base_model.load_state_dict(torch.load("./models/AutoEncoders_V1.pth"))

        self.encoder = _base_model.encoder

        self.classifier = nn.Sequential(
            nn.Linear(100, 32),
            nn.ReLU(),
            nn.Linear(32, 10)
        )

    def forward(self, input: torch.Tensor) -> torch.Tensor:
        out = self.encoder(input)
        return self.classifier(out)
    

In [179]:

Copied!

clf = MnistClassifier()
clf = MnistClassifier()

In [180]:

Copied!

clf
clf

Out[180]:

MnistClassifier(
  (encoder): Encoder(
    (conv_encoder): Sequential(
      (0): Conv2d(1, 10, kernel_size=(3, 3), stride=(1, 1))
      (1): Conv2d(10, 8, kernel_size=(3, 3), stride=(1, 1))
      (2): ReLU()
      (3): Conv2d(8, 4, kernel_size=(7, 7), stride=(1, 1))
      (4): Flatten(start_dim=1, end_dim=-1)
    )
    (linear_encoder): Sequential(
      (0): Linear(in_features=1296, out_features=128, bias=True)
      (1): Linear(in_features=128, out_features=32, bias=True)
      (2): ReLU()
      (3): Linear(in_features=32, out_features=100, bias=True)
    )
  )
  (classifier): Sequential(
    (0): Linear(in_features=100, out_features=32, bias=True)
    (1): ReLU()
    (2): Linear(in_features=32, out_features=10, bias=True)
  )
)

In [183]:

Copied!

imgs, labels = next(iter(train_loader))
imgs, labels = next(iter(train_loader))

In [184]:

Copied!

imshow_image(imgs[0])
imshow_image(imgs[0])

In [185]:

Copied!

clf(imgs[0])
clf(imgs[0])

Out[185]:

tensor([[-0.0270, -0.0376, -0.2098,  0.0372, -0.0946,  0.6591, -0.2061, -0.0600,
         -0.0110, -0.1383]], grad_fn=<AddmmBackward0>)

In [186]:

Copied!

criterion = nn.CrossEntropyLoss()
criterion = nn.CrossEntropyLoss()

In [187]:

Copied!

opt = optim.Adam(clf.classifier.parameters(), lr=0.001)
opt = optim.Adam(clf.classifier.parameters(), lr=0.001)

In [191]:

Copied!

def train(epochs: int = 1): 

    for epoch in range(epochs):
        
        for i, (imgs, labels) in enumerate(train_loader):
        
            opt.zero_grad()
        
            labels_pred = clf(imgs)
        
            loss = criterion(labels_pred, labels.clone().detach())
            
            loss.backward()
            
            opt.step()
    
            if i%500==0 or i==0:
                print(f"Epochs: {epoch:4d} | Iteration: {i:4d}| Loss: {loss.item():4.7f}")
                print("-"*140)
def train(epochs: int = 1): 

    for epoch in range(epochs):
        
        for i, (imgs, labels) in enumerate(train_loader):
        
            opt.zero_grad()
        
            labels_pred = clf(imgs)
        
            loss = criterion(labels_pred, labels.clone().detach())
            
            loss.backward()
            
            opt.step()
    
            if i%500==0 or i==0:
                print(f"Epochs: {epoch:4d} | Iteration: {i:4d}| Loss: {loss.item():4.7f}")
                print("-"*140)

In [192]:

Copied!

train(epochs=2)
train(epochs=2)

Epochs:    0 | Iteration:    0| Loss: 2.4361408
--------------------------------------------------------------------------------------------------------------------------------------------
Epochs:    0 | Iteration:  500| Loss: 0.4572141
--------------------------------------------------------------------------------------------------------------------------------------------
Epochs:    0 | Iteration: 1000| Loss: 0.2291873
--------------------------------------------------------------------------------------------------------------------------------------------
Epochs:    0 | Iteration: 1500| Loss: 0.0312376
--------------------------------------------------------------------------------------------------------------------------------------------
Epochs:    1 | Iteration:    0| Loss: 0.2579556
--------------------------------------------------------------------------------------------------------------------------------------------
Epochs:    1 | Iteration:  500| Loss: 0.1429092
--------------------------------------------------------------------------------------------------------------------------------------------
Epochs:    1 | Iteration: 1000| Loss: 0.1931087
--------------------------------------------------------------------------------------------------------------------------------------------
Epochs:    1 | Iteration: 1500| Loss: 0.0513597
--------------------------------------------------------------------------------------------------------------------------------------------

In [193]:

Copied!





with torch.inference_mode():
    input , label = next(iter(DataLoader(train_data, batch_size=50000)))
    label_pred = torch.argmax(clf(input), axis=1)
    print(classification_report(label_pred, label))
with torch.inference_mode():
    input , label = next(iter(DataLoader(train_data, batch_size=50000)))
    label_pred = torch.argmax(clf(input), axis=1)
    print(classification_report(label_pred, label))

              precision    recall  f1-score   support

           0       0.98      0.95      0.96      5056
           1       0.98      0.95      0.96      5852
           2       0.92      0.95      0.93      4849
           3       0.92      0.90      0.91      5229
           4       0.94      0.95      0.94      4813
           5       0.90      0.94      0.92      4331
           6       0.97      0.94      0.96      5084
           7       0.96      0.94      0.95      5304
           8       0.86      0.94      0.90      4497
           9       0.92      0.91      0.92      4985

    accuracy                           0.94     50000
   macro avg       0.93      0.94      0.94     50000
weighted avg       0.94      0.94      0.94     50000

In [194]:

Copied!





with torch.inference_mode():
    input , label = next(iter(test_loader))
    label_pred = torch.argmax(clf(input), axis=1)
    print(classification_report(label_pred, label))
with torch.inference_mode():
    input , label = next(iter(test_loader))
    label_pred = torch.argmax(clf(input), axis=1)
    print(classification_report(label_pred, label))

              precision    recall  f1-score   support

           0       0.98      0.94      0.96      1052
           1       0.99      0.93      0.96      1189
           2       0.91      0.94      0.93       936
           3       0.92      0.90      0.91      1027
           4       0.93      0.96      0.94       960
           5       0.90      0.93      0.92       877
           6       0.97      0.95      0.96      1028
           7       0.96      0.94      0.95      1087
           8       0.85      0.95      0.90       857
           9       0.91      0.92      0.92       987

    accuracy                           0.94     10000
   macro avg       0.93      0.94      0.93     10000
weighted avg       0.94      0.94      0.94     10000

Generative modeling & Data Augmentation¶

Generative modeling with AutoEncoders models? how it's possible ?

After the model is well trained on enough dataset, with unsupervised approch, we use the Decoder model to generte some new data that have same distribution, leaned during the training process. But, the Decoder accepts as input a $Z$ Embeddings, So how can we find (generate) $Z$? could we? If the answer is yes How? Must be random to genrate some different data at each time.

In this section, we'll answer this questions 😁

After the model has been thoroughly trained on a sufficient dataset using an unsupervised approach, we utilize the Decoder model to generate new data that follows the same distribution learned during training. However, the Decoder requires a $Z$ Embedding as input. How can we generate this $Z$ embedding? Is it possible to do so? If yes, how can we achieve this in a random manner to produce diverse data each time?

In this section, we will address these questions 😁.

In [47]:

Copied!

def get_mean_std_of(digit: int = 0) -> None:
    digit_set = Subset(mnist, indices= tuple(i for i, cls in enumerate(mnist.targets) if cls==digit))
    digit_x = torch.stack([d[0].flatten() for d in digit_set])

    emb_x = model.encoder(digit_x)

    return emb_x, emb_x.mean(axis=0), emb_x.std(axis=0)
def get_mean_std_of(digit: int = 0) -> None:
    digit_set = Subset(mnist, indices= tuple(i for i, cls in enumerate(mnist.targets) if cls==digit))
    digit_x = torch.stack([d[0].flatten() for d in digit_set])

    emb_x = model.encoder(digit_x)

    return emb_x, emb_x.mean(axis=0), emb_x.std(axis=0)

In [48]:

Copied!





def plot_emb_dist_of(digit_emb: torch.Tensor):
    
    fig, axes = plt.subplots(20, 5, figsize=(18, 40))
    for i in range(20):
        for j in range(5):
            sns.histplot(ax=axes[i, j], data=digit_emb[:, i + j].detach().numpy(), kde=True)
    
    plt.show()
def plot_emb_dist_of(digit_emb: torch.Tensor):
    
    fig, axes = plt.subplots(20, 5, figsize=(18, 40))
    for i in range(20):
        for j in range(5):
            sns.histplot(ax=axes[i, j], data=digit_emb[:, i + j].detach().numpy(), kde=True)
    
    plt.show()

In [49]:

Copied!

digit_emb, digit_mean, digit_std = get_mean_std_of(9)
digit_emb, digit_mean, digit_std = get_mean_std_of(9)

To obtain the digit_emb we fit the digit (in our case, it's 9) to encoder, and get the embeddings $Z$ represents the matrix contains vetors representation of all fitted images of digit 9. after that we compute the mean() and std() respect to axis=0. Formaly,

$$ Z = \begin{bmatrix} \cdots & Z^1 & \cdots \\ \cdots & Z^i & \cdots \\ \cdots & Z^N & \cdots \\ \end{bmatrix} $$

With $N$ denotes number of images, $Z^i$ is the embeddings vector of the $i$-th image. For each compenents of $Z^i$ we compute the mean and std.

In [50]:

Copied!

digit_emb
digit_emb

Out[50]:

tensor([[-0.7366, -0.4719, -1.4637,  ...,  0.4467,  0.0897, -0.3719],
        [-0.7426, -0.2930,  0.5301,  ...,  0.1755,  0.3119, -0.5933],
        [-0.7640, -0.2663,  0.1523,  ...,  0.8241,  0.5406, -1.1856],
        ...,
        [-1.5418,  0.3600,  0.3364,  ...,  0.0881,  0.8023, -0.4454],
        [-1.3054,  0.1946,  0.4422,  ..., -0.2022,  0.3665, -0.8218],
        [-1.1912,  0.1435,  0.5495,  ...,  0.0060, -0.0938, -0.6812]],
       grad_fn=<AddmmBackward0>)

In [51]:

Copied!

digit_emb.shape
digit_emb.shape

Out[51]:

torch.Size([5949, 100])

The digit_emb contains 5949 images's embeddings, and each vetor have 100 dimension space.

In [52]:

Copied!

digit_mean
digit_mean

Out[52]:

tensor([-1.2662, -0.0083, -0.5534,  0.8313, -1.0519,  0.0355, -0.2647,  0.0503,
        -0.6883, -0.1284, -0.2298, -2.6558, -0.2886,  0.7134, -0.4494, -1.1866,
        -0.0667,  1.8808, -0.2912,  1.1527,  1.2619, -1.4137,  0.5650, -0.1574,
        -1.9292,  0.2559,  0.1122,  0.7945, -2.3893,  0.1416, -0.2982,  0.3167,
        -1.7033, -0.0260, -2.3307,  0.9711, -0.6544, -0.1949,  0.0745, -0.0372,
         0.2322, -2.0985,  0.5538,  0.4549,  0.5401,  0.3159,  0.2779, -0.6747,
         0.1522, -2.6267, -0.2635, -1.4348,  0.9232, -0.1682,  0.0187, -0.2834,
         0.5802, -0.9982,  0.1303,  0.0860,  0.1290, -1.7354,  0.3006, -0.6328,
         0.3462,  0.1124, -0.8873, -0.3121, -0.0027, -2.2852,  0.7619,  1.2396,
        -0.5202,  0.1581,  0.4499, -0.3522, -1.9701, -1.7416, -0.0552, -0.3280,
         0.6288,  0.6407,  0.1948, -0.6247,  1.1016,  0.7926,  1.2080, -0.4066,
         0.1804,  0.9586, -0.5031,  0.0623, -0.0550,  0.2794, -0.3806,  0.5647,
        -2.4779,  0.5876, -0.0717, -0.4997], grad_fn=<MeanBackward1>)

In [53]:

Copied!

digit_std
digit_std

Out[53]:

tensor([0.5857, 0.4373, 0.6236, 0.5086, 0.8564, 0.4170, 0.3933, 0.5431, 0.6389,
        0.7187, 0.3988, 0.7239, 0.4142, 0.4460, 0.4337, 0.4392, 0.3893, 0.5184,
        0.4619, 0.6952, 0.4890, 0.6142, 0.5308, 0.3499, 0.4983, 0.4681, 0.4485,
        0.3974, 0.9741, 0.5993, 0.6073, 0.4521, 0.6441, 0.5165, 0.7473, 0.5060,
        0.5504, 0.6165, 0.3680, 0.6795, 0.4656, 0.7644, 0.7590, 0.3959, 0.4244,
        0.4839, 0.3207, 0.4111, 0.6288, 0.5481, 0.5095, 0.4486, 0.4098, 0.3575,
        0.5670, 0.5506, 0.5754, 0.5533, 0.4342, 0.8434, 0.4537, 0.6176, 0.5008,
        0.8992, 0.4078, 0.3981, 0.2968, 0.4570, 0.7115, 0.5865, 0.6330, 0.6505,
        0.4761, 0.4638, 0.4609, 0.4045, 0.5049, 0.6141, 0.6175, 0.3163, 0.4052,
        0.5183, 0.7035, 0.4513, 0.4975, 0.4532, 0.6242, 0.5191, 0.4351, 0.6003,
        0.4255, 0.4084, 0.6498, 0.6538, 0.5847, 0.5047, 0.8519, 0.4414, 0.5818,
        0.6135], grad_fn=<StdBackward0>)

In [54]:

Copied!

plot_emb_dist_of(digit_emb)
plot_emb_dist_of(digit_emb)

The plots above represents the distribution of each components (column vectors) in matrix $Z$, As we can notice, that this components have some kind of normal ditribution, formaly :

$$ Z^i_j \sim \mathcal{N}(\mu_j,\,\sigma_j^{2})\, \mid \forall j \in {1, .., M} $$

Where, $M$ is the embeddings vector dimension.

🚨 Very Important:

So we can use this information to generate new embeddings vectors, from normal distribution with corresponding mean and std. Then fitted to the Decoder model, to generate new images, that never seen before.

Are you exciting, for generating images ?

In [55]:

Copied!

gen_z = torch.normal(digit_mean, digit_std).view(-1, 100)
gen_z = torch.normal(digit_mean, digit_std).view(-1, 100)

In [56]:

Copied!

gen_z
gen_z

Out[56]:

tensor([[-1.0992, -0.1533, -0.5911,  0.7690, -0.6242, -0.3955, -0.7678,  0.8706,
         -1.2348,  0.5609,  0.2802, -2.1522, -0.1627,  0.8093, -0.8000, -1.2148,
          0.4875,  1.9073,  0.5891,  0.9994,  1.3732, -1.3779,  1.8257,  0.3863,
         -1.6149,  0.1700,  0.8696,  0.9673, -2.2741, -0.7921, -0.8481,  0.4241,
         -0.5646, -0.4899, -1.8520,  1.3170, -0.6508, -0.6952,  0.1734,  1.1369,
          0.1059, -2.6506,  0.8315,  0.5664,  0.6197,  0.6036,  0.2836, -0.5433,
          0.8050, -2.6792, -0.1089, -1.8175,  0.7994,  0.4014, -0.1033,  0.8018,
         -0.6101, -0.6959,  0.8952, -0.8804,  0.3725, -2.2469, -0.4047, -0.4948,
          0.2841,  0.0544, -0.6120, -0.3767, -1.8870, -2.6936,  1.4715,  1.3186,
         -1.2235,  0.5947,  0.4890, -0.6455, -2.4722, -1.0295,  1.2659, -0.7462,
          1.1401,  1.3221,  0.8662, -0.6891,  0.6284,  0.3139,  1.6403, -0.9198,
          0.2288,  1.1073, -0.6519, -0.1506, -0.0647, -0.0863,  0.2505, -0.4287,
         -2.2031,  0.9962, -0.0170, -0.1522]], grad_fn=<ViewBackward0>)

In [57]:

Copied!

gen_img = model.decoder(gen_z)
gen_img = model.decoder(gen_z)

In [58]:

Copied!

imshow_image(gen_img)
imshow_image(gen_img)

Woooow Great Work 🎉🎉🎉🎉🎉🎉, Let do it more one time.

In [59]:

Copied!

gen_other_z = torch.normal(digit_mean, digit_std).view(-1, 100)
gen_other_z = torch.normal(digit_mean, digit_std).view(-1, 100)

In [60]:

Copied!

gen_other_img = model.decoder(gen_other_z)
gen_other_img = model.decoder(gen_other_z)

In [61]:

Copied!

imshow_image(gen_other_img)
imshow_image(gen_other_img)

References¶

AutoEncoders Original Paper by G. E. Hinton and R. R. Salakhutdinov Reducing the Dimensionality of Data with Neural Networks
An Introduction to Autoencoders by Umberto Michelucci