feat: improve scientific RAG citations#4
Conversation
|
Hi, I opened a PR for the ISAAC/AimenGPT RAG bounty: #4 It improves the existing RAG pipeline with scientific-aware PDF chunking, stable citation keys, retrieval distances, stricter citation prompting, and tests. Could you confirm whether this fits the bounty scope? |
|
I pushed an additional hardening commit to PR #4. New improvements:
Validation:
|
|
I did an independent local validation of this PR on Windows with Node/npm from the ui/ directory.\n\nCommands/results:\n\n- |
|
One small follow-up from reading the scientific section helper: materials and methods was checked after methods, so that compound heading would be classified as methods. I opened a narrow PR against the source branch with a reorder and regression assertion: Vinzz2303#1 Validation on the follow-up branch:
|
Merge follow-up regression for Materials and Methods section detection.
|
Update: I reviewed and merged the follow-up PR from @cerredz into this branch. It tightens scientific section detection so Validation after merging: cd ui
npx vitest run __tests__/scientific-rag.test.ts --reporter verbose
# 5 passed
npx tsc --noEmit
# passed
cd ..
git diff --check
# no output |
|
Pushed one more narrow hardening follow-up to this PR.\n\nNew changes:\n- handle empty or unexpectedly shaped Chroma retrieval results without throwing in the RAG chat path\n- return an explicit no-documents context when retrieval produces no usable chunks\n- guard PDF ingestion against missing/non-array upload payloads before constructing the PDF loader\n- add regression coverage for defensive Chroma retrieval formatting\n\nValidation from ui/:\n- |
Summary
Motivation
The existing document chat flow uploads PDFs into Chroma and answers from retrieved chunks, but citations are hard to trace back to stable document/page/chunk identifiers. This makes scientific QA harder to verify. This PR adds a small citation layer around the existing RAG flow so answers can cite exact retrieved chunks such as
paper-title:p3:c2.What changed
ui/utils/server/scientific-rag.tswith helpers for:CHROMA_PATHconsistentlyTesting
From
ui/:npm test -- scientific-rag.test.ts --run npx tsc --noEmit npm run lintResults: