An AI data-construction and model-iteration workbench for vertical domains
From raw documents to Benchmark / SFT / fine-tuning / evaluation / diagnostic data augmentation, all in one loop.
Quick Start Β· Showcase Β· Workflow Β· Core Features Β· Fine-Tuning Β· OpenCompass Evaluation Β· Diagnostic Iteration
δΈζ Β· π English
ProDa is not just a collection of data-generation scripts. It is a VSCode-style Web IDE built for iterative model improvement.
It integrates document parsing, knowledge extraction, Benchmark construction, SFT data generation, LLaMA-Factory fine-tuning, OpenCompass evaluation, error diagnosis, and second-round data augmentation into one traceable project workflow.
Document
β
Knowledge Core
β
Benchmark / SFT Data
β
Fine-Tuning
β
OpenCompass Evaluation
β
Diagnosis + Supplement Data
β
Second-Round Iteration
- π Why ProDa
- πΌοΈ Showcase
- β¨ What You Get
- π¦ Quick Start
- π¬ Recommended Workflow
- ποΈ Project Structure
- π Artifact Layout
- β FAQ
- π§ Current Status
- π― Roadmap
- π Acknowledgements
- β Star History
- π€ Contributing
- π Citation
- π License
You may have run into these problems:
- You have many domain documents, but they are hard to turn into reliable training data.
- Benchmark generation, SFT data construction, training, and evaluation are scattered across scripts.
- After fine-tuning, a single score does not tell you what the model got wrong or how to improve it.
ProDa turns the whole process into a visual, project-based, traceable loop.
| Traditional workflow | ProDa |
|---|---|
| Multiple scripts glued together manually | One project workbench for the full pipeline |
| Data, training logs, and eval outputs scattered around | All states and artifacts are automatically archived per project |
| Only aggregate scores after evaluation | Sample-level results, error annotations, and diagnostic reports |
| Second-round iteration depends on manual intuition | Error-driven supplement data and merged training sets |
| Trained artifacts are hard to verify immediately | Chat directly with a model / checkpoint using streaming output |
| Module | What you can do | Output |
|---|---|---|
| Document Processing | Upload domain documents and extract knowledge cores | L1 / L2 / L3 knowledge structures |
| Benchmark | Generate evaluable questions from reasoning chains | MCQ Benchmark |
| SFT Data | Generate training data with configurable question-type ratios | FineTune / ShareGPT data |
| Fine-Tuning | Train models through LLaMA-Factory | Checkpoints / LoRA artifacts |
| Model Chat | Chat with historical models or checkpoints | Streaming replies and parameter validation |
| OpenCompass | Evaluate local/API models | Leaderboard, comparison charts, sample details |
| Diagnostic Supplement | Analyze error samples and generate targeted data | Diagnostic reports and second-round training sets |
- Create, switch, and delete projects
- Automatically archive project states and artifacts
- Review historical training and evaluation runs
- Supports
pdf/txt/md/docx - Extracts three-level knowledge representation:
L1 concepts/L2 statements/L3 reasoning chains - Supports chunking, parallel extraction, editable tables, and export
- Automatically generates multiple-choice Benchmark data from L3 reasoning chains
- Supports concurrency, retries, cancellation, resume, preview, and editing
- Controls QA / single-choice / multiple-choice / true-false ratios
- Supports sampling windows, constraints, and history review
- Generates structured diagnostic reports from OpenCompass error samples
- Produces targeted supplement data based on issue types
- Merges supplement data with original data for second-round training
- Integrates with LLaMA-Factory
- Visual training-parameter configuration
- Live logs and Loss / LR curves
- Training history and output directory management
- Streaming chat verification for trained models / checkpoints
- Supports both local models and API models
- Auto-detects LoRA / PEFT paths
- Result views: Leaderboard / Comparison / Samples
- Sample-level error annotation connected to diagnosis
- Unified view of key project artifacts and activity timeline
- Easier export and review
π¬ Project Walkthrough Video (click to play)
ProDA_IDE.mp4
|
π₯οΈ IDE Overview |
π Document Extraction and Knowledge Core |
|
π Fine-Tuning and Training Curves |
π OpenCompass Result Dashboard |
|
π¬ Model Chat Verification |
|
conda create -n ProDa python=3.10 -y
conda activate ProDa
pip install -r requirements.txtProDa depends on the following external projects:
LLaMA-Factoryfor trainingOpenCompassfor evaluation
Place LlamaFactory, opencompass, and Model directly under the ProDA/ root:
ProDA/
βββ backend/
βββ frontend/
βββ proda/
βββ LlamaFactory/ # training repo
βββ opencompass/ # evaluation repo
βββ Model/ # put all downloaded local models here
β βββ Qwen3-8B/
β βββ ...
βββ ...
Then install LlamaFactory / OpenCompass dependencies in the same runtime environment.
Recommendation: in Step5, set
model_roottoProDA/Modelso model discovery works out of the box.
The commands below assume you are at ProDA/ root and proda env is already activated.
cd LlamaFactory
pip install -e .
pip install -r requirements/metrics.txt
cd ..cd opencompass
pip install -e .
# Optional: full dataset support
# pip install -e ".[full]"
# Optional: API evaluation extras
# pip install -e ".[api]"
cd ..Note: You can install OpenCompass via
pip install -U opencompass, but source install is recommended here so the project patch script works consistently.
ProDa Step6 configs enable:
eval_cfg.pred_postprocessor = parse_multi_choice_answer(seeproda/evaluator.py)
To make this work on a clean upstream OpenCompass checkout, run exactly one command at ProDA/ root:
bash scripts/patch_opencompass_postprocess.shThis script applies the SAME local logic used in your current setup to ProDA/opencompass:
- inject/register
parse_multi_choice_answerintoopencompass/utils/text_postprocessors.py - inject robust postprocessor load/reload fallback into
opencompass/tasks/openicl_eval.py - inject the same prompt-extraction logic for compact details output
- run import + registry checks automatically after patching
If your OpenCompass is not at the default location, pass a path:
bash scripts/patch_opencompass_postprocess.sh /path/to/opencompass
If the script fails, your OpenCompass version is likely too different from the expected anchors; switch to the matching baseline and retry.
uvicorn backend.main:app --host 0.0.0.0 --port 8002 --reload --reload-dir backend --reload-dir ProDacd frontend
yarn install
yarn dev --host 0.0.0.0 --port 8503Open http://localhost:8503 in your browser.
For remote servers, set up port forwarding first:
ssh -L 8503:localhost:8503 -L 8002:localhost:8002 <your-server>Create Project
β
Configure LLM API
β
Extract Knowledge Core
β
Generate Benchmark + SFT Data
β
Fine-Tune with LLaMA-Factory
β
Evaluate with OpenCompass
β
Diagnose Errors
β
Generate Supplement Data
β
Second-Round Fine-Tuning
β
Second-Round Evaluate with OpenCompass
- Create a project
- Configure and select an LLM API
- Step1: extract the knowledge core
- Step2: generate Benchmark data
- Step3: generate FineTune data
- Step5: run fine-tuning
- Step6: run evaluation
- Step3: diagnose errors and generate supplement data
- Step5: run second-round fine-tuning
- Step6 / Step7: compare iteration gains
ProDa/
βββ backend/ # FastAPI backend
βββ frontend/ # React + Vite frontend IDE
βββ ProDa/ # Core pipeline logic
βββ ui/ # Legacy Streamlit UI kept for compatibility
βββ requirements.txt
βββ README.md
βββ README_zh.md
Each project's artifacts are stored under:
.ProDa_projects/<project_id>/
Common subdirectories:
state.json: project statefinetune_exports/: training configs, logs, and training historymodel_outputs/: trained model artifactsevaluations/opencompass/: evaluation inputs, results, and historydiagnosis/: diagnostic reports, supplement data, and historyworkflow/: second-round workflow state
Make sure both frontend and backend are running, and that port forwarding includes both frontend and backend ports.
If you are running in a cluster terminal environment, request a compute node and run hostname to get the HTTP host. Then update the API target in frontend/vite.config.ts.
Generate and save data in Step3 first, or finish supplement-data merging.
Check all of the following:
- OpenCompass path, base model path, and LoRA path
- whether OpenCompass and ProDa run in the same Python environment
- whether OpenCompass completed the 3-step
parse_multi_choice_answerpatch (see 2.1 above) - whether Step5
model_rootpoints toProDA/Model(to avoid model discovery misses)
This is normal in cluster environments, especially during first model load, tokenizer cache building, or multi-GPU initialization.
The current version already covers the main loop from data construction to evaluation and diagnostic iteration:
- Document processing and knowledge extraction
- Benchmark generation with resume support
- FineTune data generation
- Local fine-tuning
- OpenCompass evaluation
- Diagnostic reports and supplement data
- Second-round training iteration
- Streaming chat verification for fine-tuned models
ProDa is built on top of these excellent projects and ecosystems:
- LLaMA-Factory β efficient fine-tuning framework
- OpenCompass β large-model evaluation system
- FastAPI β backend API service
- React / Vite β frontend interaction and tooling
- VSCode / Cursor β key inspirations for the IDE-style experience
Thanks also to everyone who provides feedback from real-world domain workflows.
ProDa is not intended to be a toy demo; it aims to move closer to a practical workbench for domain model iteration.
If this project helps you, please consider giving it a Star.
Issues and PRs are welcome.
Good contribution directions include:
- More real-world domain data workflows
- Better OpenCompass sample-level visualizations
- Stronger diagnostic reports and supplement strategies
- Better cluster deployment documentation
- Docker / Conda environment files
- README screenshots, demos, and tutorials
If you have a real-world scenario in education, healthcare, finance, industry, or other vertical domains, feedback is especially welcome.
@article{pan2026programming,
title={Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora},
author={Pan, Chenkai and Xu, Xinglong and Xu, Yuhang and Wu, Yujun and Li, Siyuan and Chen, Jintao and He, Conghui and Wei, Jingxuan and Tan, Cheng},
journal={arXiv preprint arXiv:2604.24819},
year={2026}
}
MIT
ProDa is intended for education, research, and technical exchange.
If you find this project interesting, feel free to Star / Fork / try the full loop.