feat: enhanced scientific RAG pipeline with cross-encoder reranking and metadata filtering by hobyt-aluzar · Pull Request #5 · aietal/aimengpt

hobyt-aluzar · 2026-05-14T07:25:34Z

Enhanced Scientific RAG Pipeline with Cross-Encoder Reranking

This PR introduces an enhanced scientific RAG pipeline that significantly improves retrieval quality and citation accuracy, enabling fully private, local semantic reranking.

Key Changes:

Local Reranking: Integrated @xenova/transformers with Xenova/ms-marco-MiniLM-L-6-v2 cross-encoder. It fetches an expanded pool of documents from ChromaDB, scores them semantically against the user's query, and returns the top results. This process runs 100% offline.
Scientific Chunking & Metadata: Enhanced inject-documents.ts to use targeted text separators for scientific sections (e.g., Abstract, Methods, Results). It now extracts the section, page, and dynamically generated fallback titles to build robust metadata.
Citation-Aware Generation: Modified rag-chat.ts to pass the Rerank Score and precise citation keys (e.g., [document-name:p1:c2]) directly into the LLM context. The prompt has been updated to enforce the usage of these citation keys, guaranteeing traceable and accurate responses.
Test Coverage: Added scientific-rag.test.ts to ensure stability of citation keys, title normalization, and section detection.

This architecture scales cleanly and keeps all data on the user's local machine, while delivering near state-of-the-art semantic retrieval performance.

…nd metadata filtering

feat: enhanced scientific RAG pipeline with cross-encoder reranking a…

e4718f2

…nd metadata filtering

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: enhanced scientific RAG pipeline with cross-encoder reranking and metadata filtering#5

feat: enhanced scientific RAG pipeline with cross-encoder reranking and metadata filtering#5
hobyt-aluzar wants to merge 1 commit into
aietal:masterfrom
hobyt-aluzar:feature/enhanced-scientific-rag

hobyt-aluzar commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hobyt-aluzar commented May 14, 2026

Enhanced Scientific RAG Pipeline with Cross-Encoder Reranking

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant