Skip to content

K-QA #45 added#52

Open
Manishram-ai wants to merge 19 commits intoMedARC-AI:mainfrom
Manishram-ai:main
Open

K-QA #45 added#52
Manishram-ai wants to merge 19 commits intoMedARC-AI:mainfrom
Manishram-ai:main

Conversation

@Manishram-ai
Copy link
Contributor

No description provided.

Copy link
Collaborator

@warner-benjamin warner-benjamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Could you address these two comments? Also, make sure to add the license at the top of the files if you reused any of the author's code.

@warner-benjamin
Copy link
Collaborator

Using the latest released version of verifiers 0.1.5.post0, the current code doesn't run.

verifiers.rubrics.JudgeRubric - ERROR - Error calling reward function <lambda>: module 'verifiers.utils' has no attribute 'ensure_async'

Also, when using a Judge model, you need to support different api keys and urls for each model. Look at healthbench as an example.

@Manishram-ai
Copy link
Contributor Author

Manishram-ai commented Oct 17, 2025

this pr adds

  • k_qa_batched: single call scoring for comprehensiveness + hallucination

  • k_qa_batched environment that mirrors k_qa but computes comprehensiveness and hallucination in one judge call.

  • Scoring rubric: uses batch_eval_prompt to get both:

    • comprehensiveness.entailed_must_have_claims
    • hallucination.contradictory_generated_claims
  • Metrics

    • comprehensiveness = entailed_must_have_claims_count / must_have_count
    • hallucination_count = number of contradictory generated claims
  • some small bug fixes like the f" string

@warner-benjamin warner-benjamin mentioned this pull request Oct 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants