Skip to content

AgentClinic #12

@tmabraham

Description

@tmabraham

Implement a lightweight, self-contained script that runs and reports accuracy scores on AgentClinic benchmark

Requirements:

  1. supports open-source local models with vLLM
  2. also supports a variety of closed-source models like OpenAI, Anthropic, Gemini
  3. Flexibility with prompts/chat templates/etc.
  4. Implement the MedQA and NEJM versions of the benchmark

Codebase: https://github.com/SamuelSchmidgall/AgentClinic

Highly recommend using Prime Intellect Environments (https://github.com/PrimeIntellect-ai/prime-environments)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions