Skip to content

Conversation

@kmazrolina
Copy link

New metric definitions for llama-3-3-70b as judge in Arena Hard benchmark

  • Added metric definitions for llama-3-3-70b as judge in Arena Hard benchmark supporting:
    • WML Inference Engine
    • Generic Inference Engine

…mark

* Added metric definitions for llama-3-3-70b as judge in Arena Hard benchmark supporting:
- WML Inference Engine
- Generic Inference Engine

Signed-off-by: karolina.zrobek <[email protected]>
@kmazrolina kmazrolina marked this pull request as draft October 27, 2025 13:46
@kmazrolina kmazrolina marked this pull request as ready for review October 27, 2025 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant