Skip to content

ipccheng/doc-reviews

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

doc-reviews

Nutanix technical questionnaire reviews — completed by Sam (AI assistant) using battlecard-anchored RAG search.

The reviews in this repo are the output of a controlled comparison between two approaches to AI-assisted technical document review: RAG-grounded (live retrieval from Nutanix KB and docs) vs direct LLM (no retrieval, training-cutoff knowledge only).


RAG vs Non-RAG: Real-World Comparison

The comparison below is from actual production queries during the same session. Same question, same model, same context — only the retrieval pipeline differs.

Query: "What are the differences between Red Hat OpenShift and Nutanix Native Hyperconverged Infrastructure?"

Metric Non-RAG (Direct LLM) RAG-Grounded
Answer quality Hallucinated Battlecard-sourced
Latency ~1–2s ~6–8s
Input tokens 14,394 ~1,700
Output tokens 1,621 ~1,600
Total tokens 75,631 ~3,300
Training knowledge Frozen at cutoff Live from current KB

What changed with RAG

  • 23× fewer tokens — retrieval grounds the answer so the model doesn't need to "guess" its context into existence
  • Battlecard sourcing — answers cite specific KB numbers, product names, and version facts rather than sounding plausible
  • Domain accuracy — Nutanix-specific claims (AOS versions, RF2/RF3 behaviour, NCC health checks) are verified against actual docs, not hallucinated
  • ~6–8s overhead — vector search + reranking + light LLM synthesis adds ~6s on top of direct LLM; acceptable for quality-sensitive work

When to use each

Approach Best for
Direct LLM Speed-first tasks where training knowledge is sufficient: drafting generic content, brainstorming, language polishing
RAG Domain-specific questions where accuracy matters: Nutanix configs, KB references, version lifecycle, compatibility matrices

For the Nutanix questionnaire work in this repo, all reviews use the RAG-grounded approach — because accuracy against specific product versions, hardware compatibility lists, and lifecycle dates is what determines whether a submission is marked responsive or non-responsive.


Repo Structure

doc-reviews/
├── README.md          ← this file
└── review.md          ← completed questionnaire review (2026-04-21)

Tools Used

  • Sam (this agent): RAG search via nutanix_rag_search.py → LanceDB nutanix_rag_v3 → Gemma 4 31B reranking → synthesised answer
  • NX_Shield (parallel agent, external engineers): Direct LLM (no RAG) — for contrast baseline
  • Vector DB: LanceDB (nutanix_rag_v3, ~170k rows) with Jina AI embeddings
  • Search: SearXNG (self-hosted, key-free) for live web fallback

About

Document reviews

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors