VisualReF: Visual Relevance Feedback Prototype for Interactive Image Retrieval

This is an official implementation of the demo paper "VisualReF: Visual Relevance Feedback Prototype for Interactive Image Retrieval" presented at Recsys'25.

VisualReFis the prototype of an interactive image search system based on visual relevance feedback.

The system that uses relevance feedback provided by the user to improve the search results. Specifically, the user can annotate the relevance and irrelevance of the retrieved images. These annotations are then used by image captioning model (currently, LLaVA-1.5 7b) to generate captions for image fragments from relevance feedback. These captions are then used to refine the search results using Rocchio's algorithm.

Bibtex:

@inproceedings{10.1145/3705328.3759341,
author = {Khaertdinov, Bulat and Popa, Mirela and Tintarev, Nava},
title = {VisualReF: Interactive Image Search Prototype with Visual Relevance Feedback},
year = {2025},
isbn = {9798400713644},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3705328.3759341},
doi = {10.1145/3705328.3759341},
booktitle = {Proceedings of the Nineteenth ACM Conference on Recommender Systems},
pages = {1353–1356},
numpages = {4},
location = {
},
series = {RecSys '25}
}

Example of the use case:

(a) Input query and retrieved images:

(b) User annotates the relevance and irrelevance of the retrieved images and the system returns the explanation of the relevance (captioning results)

(c) User launches the refinement process and gets the updated search results:

Check more examples in the ./assets folder.

Configs:

Captioning model and prompts: configs/captioning/. Prompts used to generate captions can be changed based on the use case and level of detail required. Advanced prompting techniques can be used to improve the quality of the captions.
Demo: configs/demo/. Configs for the demo app: retrieval backbone (CLIP and SigLIP are currently supported), image database, and captioning model.

Getting started

The prototype is implemented using Gradio, PyTorch, and Hugging Face. It supports CPU and GPU, with a GPU (16+ GB VRAM) recommended for faster inference.

Python version: 3.11

We use venv for managing project dependencies.

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Data

We use two open-source datasets as use-cases for our demo:

General image search with COCO dataset:

Data preparation:

mkdir data
mkdir data/coco

cd data
wget http://images.cocodataset.org/zips/train2014.zip
wget http://images.cocodataset.org/zips/val2014.zip
wget http://images.cocodataset.org/zips/test2014.zip

unzip train2014.zip -d coco/
unzip val2014.zip -d coco/
unzip test2014.zip -d coco/

Build a faiss index with clip-vit-large-patch14 for the test set:

python write_faiss_index.py \
    --data data/coco/test2014 \
    --output faiss/coco/ \
    --batch_size 64 \
    --model_family clip \
    --model_id openai/clip-vit-large-patch14

It is also possible to index the whole database (will take longer) with --data data/coco/.

Retail catalogue search with Retail-786k: Data preparation:

wget https://zenodo.org/records/7970567/files/retail-786k_256.zip?download=1 -O retail-768k_256.zip

unzip retail-786k_256.zip -d data/

Build faiss index with clip-vit-large-patch14:

python write_faiss_index.py \
    --data data/retail-786k_256/ \
    --output faiss/retail/test \
    --batch_size 64 \
    --model_family clip \
    --model_id openai/clip-vit-large-patch14

Launch the prototype

Local monolith version (`demo/app.py`)

With image database based on COCO dataset and clip-vit-large-patch14:

python -m demo.app \
    --config_path configs/demo/coco_clip_large.yaml \
    --captioning_model_config_path configs/captioning/llava_8bit.yaml

With image database based on Retail-786k dataset and clip-vit-large-patch14:

python -m demo.app \
--config_path configs/demo/retail_clip_large.yaml \
--captioning_model_config_path configs/captioning/retail_llava_8bit.yaml

Service-based (client-server) architecture

Server (FastAPI)

Make sure all dependencies are installed:

pip install -r requirements.txt

Launch retrieval server with GPU support (faiss, retrieval backbone, VLMs):

CONFIG_PATH=configs/demo/coco_clip_large.yaml \
CAPTIONING_CONFIG_PATH=configs/captioning/llava_8bit.yaml \
python -m server.retrieval_server

Check status:

curl -s http://localhost:8000/health

Client (gradio)

Install client requirements:

pip install -r requirements-client.txt

Check that server responds:

curl -s http://<SERVER_IP>:8000/health

Launch client with gradio interface:

python -m demo.app_client \
    --config_path configs/demo/coco_clip_base.yaml \
    --captioning_model_config_path configs/captioning/llava_8bit.yaml \
    --server_url http://<SERVER_IP>:8000 \
    --port 7861

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
client		client
configs		configs
demo		demo
models		models
server		server
services		services
tests		tests
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements-client.txt		requirements-client.txt
requirements.txt		requirements.txt
ruff.toml		ruff.toml
write_faiss_index.py		write_faiss_index.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VisualReF: Visual Relevance Feedback Prototype for Interactive Image Retrieval

Example of the use case:

Configs:

Getting started

Data

Launch the prototype

Local monolith version (`demo/app.py`)

Service-based (client-server) architecture

Server (FastAPI)

Client (gradio)

About

Uh oh!

Releases

Packages

Languages

License

bulatkh/visualref

Folders and files

Latest commit

History

Repository files navigation

VisualReF: Visual Relevance Feedback Prototype for Interactive Image Retrieval

Example of the use case:

Configs:

Getting started

Data

Launch the prototype

Local monolith version (demo/app.py)

Service-based (client-server) architecture

Server (FastAPI)

Client (gradio)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Local monolith version (`demo/app.py`)

Packages