GitHub - Tejas-TA/LegalQA: LegalQA → A framework for comparing Single-Pass RAG with an Iterative AI Agent to evaluate consistency, accuracy, hallucination rates, and citation correctness in legal question answering.

LegalQA

LegalQA is a research project exploring Retrieval-Augmented Generation (RAG) for legal document question answering. It compares single-pass RAG pipelines against a multi-hop iterative retrieval agent on U.S. Supreme Court case law from Case law access (CAP) project.

Features Dataset: Built from the Caselaw Access Project (U.S. Supreme Court opinions, 1984–2014). https://static.case.law/us/

Data Processing:

Normalize raw case JSON into a slim format (cases_slim.jsonl).
Chunk opinions into passages for retrieval.

Retrieval: FAISS index with bge-small-en embeddings, plus optional cross-encoder reranker.

Pipelines:

Baseline: Single-pass retrieval → prompt LLM with top-k chunks.
Iterative Agent: Multi-step reasoning loop: retrieval → self-check → query refinement → final answer.

Evaluation Metrics:

Answer semantic similarity (vs. gold QA set).
Citation precision & recall.
Hallucination rate.
Hop effectiveness (did extra retrieval help?).

Workflow:

Data Prep: data/raw/json/*.json → data/processed/cases_slim.jsonl → data/processed/chunks.jsonl
Indexing: Embed chunks with BAAI/bge-small-en-v1.5 and store in FAISS.
QA Pipelines:retrieve(query) → build_prompt() → ask_llm(). Then, iterative_agent(query) with self-check & refinement.

Evaluation:

Compare baseline vs iterative using gold_qa.jsonl.

Example:

query = "What did the Supreme Court say about international child abduction?"

retrieved = retrieve(query, top_k=3)

prompt = build_prompt(query, retrieved)

answer = ask_llm(prompt)

print(answer)

Results (sample):

Baseline: Higher semantic similarity to gold answers.
Iterative: More correct citations, but sometimes drifts in phrasing.

Tech Stack:

Python, FAISS, pandas
SentenceTransformers (BAAI/bge-small-en, all-MiniLM-L6-v2)
OpenAI GPT models for answering and self-checking

Future Work:

Use more sophisticated rerankers.
Enrich gold answers with full case syllabi.
Evaluate on other datasets (e.g., EUR-Lex, CUAD).

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
LegalQA.ipynb		LegalQA.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LegalQA

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LegalQA

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages