Skip to content

Latest commit

 

History

History
66 lines (44 loc) · 2.44 KB

README.md

File metadata and controls

66 lines (44 loc) · 2.44 KB

Multimodal Search Engine

This project is a Multi-Modal Search Engine developed using CLIP by OpenAI, with Flask API for backend and HTML/CSS for the frontend web application.

Introduction

This project provides a seamless web interface where users can input text queries, and the system retrieves relevant images based on the textual description based on CLIP architecture read the paper.

Take a look

Screenshot-2024-04-10-at-11-02-46-PM

Screenshot-2024-04-10-at-11-03-23-PM

Screenshot-2024-04-10-at-11-03-51-PM

Screenshot-2024-04-10-at-11-04-14-PM

Demo Video

Watch the YouTube video

  • This video demonstrates how to use our project's main feature.

How to use for your own images?

  • Sample data of 130 images is present in the file or
  • See the video or
  • Place your images in src/minidata
  • Run the notebook src/image-processor
  • Move the data in src/image_embeddings & the data in src/minidata to flaskapp/image_embeddings & flaskapp/static respectively (caution: transfer the data, not the directories)

Features

  • Multi-Modal Search: Users can input textual descriptions of images to retrieve relevant images.
  • Intuitive Web Interface: The frontend is built using React to provide a user-friendly experience.
  • Scalable Backend: Flask API serves as the backend, handling requests and interacting with the CLIP model.

Clone the repository:

git clone https://github.com/ahmedembeddedxx/multimodal-search-engine.git

Usage

Start the backend server:

cd flaskapp/
flask run

Access the web application in your browser at http://127.0.0.1:5000/.

Stacks

  • OpenAI for developing CLIP.
  • Flask for the backend framework.

Future Expectences

  • Shift the app to ReactJs
  • Use ImageBind by MetaAI
  • More accurate modal evaluation
  • Integrate Audio & Video Functionality