Skip to content

EashanKaushik/reverse-visual-search-using-FaceNet-and-Milvus

Repository files navigation

Reverse-Visual-Search


Figure 1: Reverse Visual Search

  • To view the outputs for 10 query images in the notebook checkout:
  1. Output of BaseLine Model
  2. Output of Improved Model

PROBLEM STATEMENT

We have all played the game of “spot the difference” in which we need to find differences between two similar images. To build upon the context, can you find images that are similar to a given image? The google reverse image search is an apt description of what we are trying to building in this project. Our problem statement is to find N similar images, given an input image.

DATASET

In this project, we are going to utilize the LFW Dataset (Labelled Faces in the Wild), a compilation of face pairs of positively and negatively weighted images samples collected by the researchers of the University of Massachusetts. The dataset comprises 13233 images of 5749 people with 1680 people having two or more images. LFW Dataset is the benchmark for pair matching, also known as face verification. For our project, we are going to build our model on this dataset and later utilize this model for a reverse visual search on a given face image.


Figure 2: LFW Dataset

ARCHITECTURE


Figure 3: Model Architecture

Steps for reverse visual search:

  1. Generating the embeddings for entire dataset.
  2. Storing these embeddings in a vector database.
  3. Generating the embedding for query image.
  4. Searching the 20 closest neighbors in the vector database.
  5. Giving the results.

For the purpose of developing a model from ground up we have developed three different architectures and tried to get better results in each iteration. Different models are discussed below:

  1. Developing a Baseline Model: In this model we have used ResNet50 for generating the embeddings of an image and providing it as an input to searching ML model, in this case K-Nearest Neighbor.
  2. Initial Improvement on Baseline Performance: Utilizing MTCNN Face detection to extract target images (faces) and utilizing ResNet50 along with K- Nearest Neighbor (similar to the baseline model) for visual search.
  3. Final Improvement on Baseline Performance: Utilize MTCNN Face detection to extract target faces from the image, generating embeddings using FaceNet and finally using Milvus to search for the similar images.

BASELINE MODEL


Figure 4: Baseline Model

Results for Baseline Model:


Figure 5: Carmen_Electra


Figure 6: 20 Similar Faces for Carmen Electra

IMPROVED MODEL


Figure 7: Improved Model

Results for IMPROVED Model:


Figure 8: Carmen Electra


Figure 9: 20 Similar Faces for Carmen Electra


Figure 10: Albert Costa


Figure 11: 20 Similar Faces for Albert Costa


Figure 12: Angela Bassett


Figure 13: 20 Similar Faces for Angela Bassett


Figure 14: Arminio Fraga


Figure 15: 20 Similar Faces for Arminio Fraga


Figure 16: Billy Crystal


Figure 17: 20 Similar Faces for Billy Crystal


Figure 18: Bob Graham


Figure 19: 20 Similar Faces for Bob Graham


Figure 20: Boris Becker


Figure 21: 20 Similar Faces for Boris Becker


Figure 22: Bulent Ecevit


Figure 23: 20 Similar Faces for Bulent Ecevit


Figure 24: Calista Flockhart


Figure 25: 20 Similar Faces for Calista Flockhart


Figure 26: Cameron Diaz


Figure 27: 20 Similar Faces for Cameron Diaz

Steps to Replicate the Experiment:

This project was completed on multiple machines:

1. Colab Pro (High Ram and GPU) for running notebooks for training.

a. model-training-baseline.ipynb

Input LFW Dataset: Input

Output: Output

b. model-training-facenet.ipynb

Input: Input

Output: Output

2. EC2 Instance for Milvus.

a. model-training-milvus.ipynb

Input:

Output: Output

b. query/Improvement-final-Milvus.ipynb

Input:

Output Final Model: Output

3. Local Machine for running notebooks for generating outputs.

a. query/Preprocessing-Queries.ipynb

Input: Input

Output MTCNN: Output

b. query/Baseline_Model-Output.ipynb

Input:

Output Baseline Images: Output

c. query/Improvement-final-FaceNet.ipynb

Input:

Output: Output

Step 1

Download the LFW Dataset: Dataset

Step 2

In this step we will generate embeddings for our training dataset which is essentially the entire LFW dataset. Also for comparison purposes we will get accuracy for our Baseline Model and Improved Model. Please Note that accuracy is not the correct term for this problem statement however just to compare models we have incorporated this into our experiments. We will split the LFW dataset into train and test images (however when we generate results for query images we will use the entire LFW dataset). Notebooks that are used in this step are as follows:

(1) model-training-baseline.ipynb (Colab)
(2) model-training-facenet.ipynb (Colab)
(3) model-training-milvus.ipynb (EC2)

Step 3

In this step we will get 20 similar faces for 10 query images. Query images are the first 10 images in http://vis-www.cs.umass.edu/lfw/number_6.html. The notebooks used in this section are as follows:

(1) PreProcessing Step: query/Preprocessing-Queries.ipynb (First notebook to run)
(2) Baseline Model: query/Baseline_Model-Output.ipynb
(3) Final Improvement: query/Improvement-final-FaceNet.ipynb, query/Improvement-final-Milvus.ipynb

REFERENCES:

[1] https://towardsdatascience.com/understanding-and-coding-a-resnet-in-keras-446d7ff84d33
[2]https://towardsdatascience.com/how-does-a-face-detection-program-work-using-neural-networks-17896df8e6ff
[3] https://milvus.io/docs/image_similarity_search.md
[4] https://aws.amazon.com/blogs/machine-learning/building-a-visual-search-application-with-amazon-sagemaker-and-amazon-es/
[5] http://vis-www.cs.umass.edu/lfw/
[6] https://www.geeksforgeeks.org/facenet-using-facial-recognition-system/
[7] https://milvus.io/docs/v2.0.x/overview.md

Group Members

  1. Eashan Kaushik (AWS Contact)
  1. Rishab Redhu
  1. Rohan Jesalkumar Patel

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published