Skip to content

LawEngine/cite-bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cite-bench v1

cite-bench is a public benchmark for legal citation verification.

Given a citation and a quoted passage, a model must classify the pair as one of four labels:

  • VERIFIED — the quote belongs in the cited provision
  • NOT_FOUND — the citation is real but the quote is fabricated or altered
  • MISATTRIBUTED — the quote is real legal text but from a different provision
  • CITATION_UNRESOLVED — the citation itself is malformed or nonexistent

This public repo ships:

  • a public benchmark input pack
  • a blank submission template
  • sample prompts
  • a public runner that produces id,predicted_status CSV submissions

This public repo does not ship:

  • private grading keys
  • hidden eval or holdout packs
  • local scoring code
  • backend upload or grading services

Quick Start

git clone https://github.com/LawEngine/cite-bench.git
cd cite-bench
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
cp .env.example .env

Add your OPENAI_API_KEY to .env.

Repo Layout

cite-bench/
├── citebench/
├── data/
├── prompts/
├── results/
└── scripts/

Key tracked files:

  • data/cite-bench-v1.json
  • data/submission_template.csv
  • prompts/system_prompt.md
  • prompts/user_prompt.md
  • scripts/run_openai.py

Dataset Contract

The public dataset rows contain only:

  • id
  • citation
  • quote

The public pack intentionally does not include private grading metadata such as expected_status or internal source hints.

Run The Public Benchmark

Smoke test:

.venv/bin/python scripts/run_openai.py \
  --limit 5 \
  --output outputs/submissions/probe_5.csv \
  --audit outputs/audit/probe_5.jsonl

Full public pack:

.venv/bin/python scripts/run_openai.py \
  --output outputs/submissions/citebench_v1.csv \
  --audit outputs/audit/citebench_v1.jsonl

You can also override model settings:

.venv/bin/python scripts/run_openai.py \
  --model gpt-5.4-mini \
  --reasoning-effort high \
  --concurrency 20 \
  --max-output-tokens 1600

Output Contract

Submission CSVs use this schema:

id,predicted_status

predicted_status must be exactly one of:

  • VERIFIED
  • NOT_FOUND
  • MISATTRIBUTED
  • CITATION_UNRESOLVED

Public Prompt Baseline

The tracked prompt pair is:

  • prompts/system_prompt.md
  • prompts/user_prompt.md

These are public baseline prompts, not private max-performance prompts.

Scoring Boundary

Official grading is intentionally not included in this public repo.

The public repo is for:

  • downloading the public pack
  • running a model
  • generating a valid submission CSV

Private grading keys and backend upload/scoring logic live outside this repo.

Licensing

Software source code in this repository is licensed under Apache-2.0. See LICENSE.

The benchmark dataset, prompt files, and benchmark-facing documentation are licensed under CC BY 4.0. See DATA_LICENSE.md.

Development Notes

Private operator notes and local setup docs are intentionally kept out of tracked git paths in this repo.

Releases

No releases published

Packages

 
 
 

Contributors

Languages