AgentClinic

Implement a lightweight, self-contained script that runs and reports accuracy scores on AgentClinic benchmark

Requirements:
1. supports open-source local models with vLLM
2. also supports a variety of closed-source models like OpenAI, Anthropic, Gemini
3. Flexibility with prompts/chat templates/etc.
4. Implement the MedQA and NEJM versions of the benchmark


Codebase: https://github.com/SamuelSchmidgall/AgentClinic

Highly recommend using Prime Intellect Environments (https://github.com/PrimeIntellect-ai/prime-environments)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AgentClinic #12

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AgentClinic #12

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions