Skip to content

feat: enhanced scientific RAG pipeline with cross-encoder reranking and metadata filtering#5

Open
hobyt-aluzar wants to merge 1 commit into
aietal:masterfrom
hobyt-aluzar:feature/enhanced-scientific-rag
Open

feat: enhanced scientific RAG pipeline with cross-encoder reranking and metadata filtering#5
hobyt-aluzar wants to merge 1 commit into
aietal:masterfrom
hobyt-aluzar:feature/enhanced-scientific-rag

Conversation

@hobyt-aluzar
Copy link
Copy Markdown

Enhanced Scientific RAG Pipeline with Cross-Encoder Reranking

This PR introduces an enhanced scientific RAG pipeline that significantly improves retrieval quality and citation accuracy, enabling fully private, local semantic reranking.

Key Changes:

  • Local Reranking: Integrated @xenova/transformers with Xenova/ms-marco-MiniLM-L-6-v2 cross-encoder. It fetches an expanded pool of documents from ChromaDB, scores them semantically against the user's query, and returns the top results. This process runs 100% offline.
  • Scientific Chunking & Metadata: Enhanced inject-documents.ts to use targeted text separators for scientific sections (e.g., Abstract, Methods, Results). It now extracts the section, page, and dynamically generated fallback titles to build robust metadata.
  • Citation-Aware Generation: Modified rag-chat.ts to pass the Rerank Score and precise citation keys (e.g., [document-name:p1:c2]) directly into the LLM context. The prompt has been updated to enforce the usage of these citation keys, guaranteeing traceable and accurate responses.
  • Test Coverage: Added scientific-rag.test.ts to ensure stability of citation keys, title normalization, and section detection.

This architecture scales cleanly and keeps all data on the user's local machine, while delivering near state-of-the-art semantic retrieval performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant