K-QA #45 added by Manishram-ai · Pull Request #52 · MedARC-AI/med-lm-envs

Manishram-ai · 2025-10-13T07:06:39Z

No description provided.

warner-benjamin

Thanks for the PR. Could you address these two comments? Also, make sure to add the license at the top of the files if you reused any of the author's code.

environments/k_qa/k_qa.py

environments/k_qa/pyproject.toml

warner-benjamin · 2025-10-14T16:12:53Z

Using the latest released version of verifiers 0.1.5.post0, the current code doesn't run.

verifiers.rubrics.JudgeRubric - ERROR - Error calling reward function <lambda>: module 'verifiers.utils' has no attribute 'ensure_async'

Also, when using a Judge model, you need to support different api keys and urls for each model. Look at healthbench as an example.

Manishram-ai · 2025-10-17T20:27:30Z

this pr adds

k_qa_batched: single call scoring for comprehensiveness + hallucination
k_qa_batched environment that mirrors k_qa but computes comprehensiveness and hallucination in one judge call.
Scoring rubric: uses batch_eval_prompt to get both:
- comprehensiveness.entailed_must_have_claims
- hallucination.contradictory_generated_claims
Metrics
- comprehensiveness = entailed_must_have_claims_count / must_have_count
- hallucination_count = number of contradictory generated claims
some small bug fixes like the f" string

…r feedbacks

Manishram-ai added 3 commits October 12, 2025 23:53

add k_qa data for eval

94e3da7

update prompts

8f9cc91

update readme

f124318

warner-benjamin requested changes Oct 13, 2025

View reviewed changes

environments/k_qa/k_qa.py Outdated Show resolved Hide resolved

environments/k_qa/pyproject.toml Outdated Show resolved Hide resolved

Manishram-ai added 4 commits October 13, 2025 22:58

change the file to use RubricGroup()

22ffab7

install medarc-verifiers from github

5e08b59

Add license info

d3656a1

add dataset for eval

7285755

Manishram-ai added 3 commits October 14, 2025 15:59

Modified code to address comments

d32e1e8

update readme

cde3128

Added new k_qa_batched for one api call and fix some bugs

9f93f99

warner-benjamin mentioned this pull request Oct 21, 2025

CareQA env #33 #48

Merged

Manishram-ai and others added 9 commits October 20, 2025 22:57

Added batch as a true or false argument ( in one env)

75d3ee8

new changes to hallucination_rate_reward, weights and extraction afte…

7d7e421

…r feedbacks

remove ds_store

866ccd6

format, add verbose arg

a5857d8

Merge branch 'main' into pr/Manishram-ai/52

8e1ea33

add max_parallel_judges semaphore

bc4848e

Merge branch 'MedARC-AI:main' into main

723ce09

add medhallu eval

9ce23d5

update pyproject.toml

982d90a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

K-QA #45 added#52

K-QA #45 added#52
Manishram-ai wants to merge 19 commits intoMedARC-AI:mainfrom
Manishram-ai:main

Manishram-ai commented Oct 13, 2025

Uh oh!

warner-benjamin left a comment

Uh oh!

Uh oh!

Uh oh!

warner-benjamin commented Oct 14, 2025

Uh oh!

Manishram-ai commented Oct 17, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Manishram-ai commented Oct 13, 2025

Uh oh!

warner-benjamin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

warner-benjamin commented Oct 14, 2025

Uh oh!

Manishram-ai commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Manishram-ai commented Oct 17, 2025 •

edited

Loading