-
Notifications
You must be signed in to change notification settings - Fork 35
Open
Description
Implement a lightweight, self-contained script that runs and reports accuracy scores on AgentClinic benchmark
Requirements:
- supports open-source local models with vLLM
- also supports a variety of closed-source models like OpenAI, Anthropic, Gemini
- Flexibility with prompts/chat templates/etc.
- Implement the MedQA and NEJM versions of the benchmark
Codebase: https://github.com/SamuelSchmidgall/AgentClinic
Highly recommend using Prime Intellect Environments (https://github.com/PrimeIntellect-ai/prime-environments)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels