Skip to content

Conversation

@Sensiel
Copy link
Contributor

@Sensiel Sensiel commented Nov 5, 2025

This PR introduces an independent, end-to-end RAG pipeline for PDF document, built to showcase the ease-of-use of the ColPali model for embedding and retrieval.

The pipeline demonstrates:

  • Easy Integration: A self-contained example of using ColPali for generating dense vector embeddings.
  • Efficient Retrieval: Implements a retrieval system that uses ColPali's embeddings for context fetching.

@fabnemEPFL fabnemEPFL requested a review from Copilot November 5, 2025 14:21
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a ColPali-based document retrieval system with three main components: PDF processing, embedding indexing, and query retrieval. The implementation uses ColPali embeddings with Milvus for vector storage and supports API, batch, and single-query retrieval modes.

Key Changes

  • Introduces PDF processing pipeline that converts PDFs to embeddings using ColPali models
  • Implements Milvus-based indexing system for storing and searching ColPali embeddings
  • Adds retrieval functionality with API server, batch processing, and single-query modes

Reviewed Changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
src/mmore/colpali/run_retriever.py Implements query retrieval with multiple modes (API/batch/single)
src/mmore/colpali/run_process.py Processes PDFs to generate ColPali embeddings and stores them in Parquet
src/mmore/colpali/run_index.py Indexes embeddings from Parquet into Milvus database
src/mmore/colpali/milvuscolpali.py Core Milvus operations manager for ColPali embeddings
examples/colpali/config_retrieval.yml Configuration for retrieval operations
examples/colpali/config_process.yml Configuration for PDF processing
examples/colpali/config_index.yml Configuration for indexing operations
Comments suppressed due to low confidence (1)

src/mmore/colpali/run_retriever.py:2

  • Import of 'concurrent' is not used.
import concurrent.futures

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@fabnemEPFL fabnemEPFL requested a review from Copilot December 4, 2025 18:51
Copilot finished reviewing on behalf of fabnemEPFL December 4, 2025 18:56
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 13 changed files in this pull request and generated 18 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant