New metric definitions for llama-3-3-70b as judge in Arena Hard benchmark #1949

kmazrolina · 2025-10-27T13:32:39Z

New metric definitions for llama-3-3-70b as judge in Arena Hard benchmark

Added metric definitions for llama-3-3-70b as judge in Arena Hard benchmark supporting:
- WML Inference Engine
- Generic Inference Engine

…mark * Added metric definitions for llama-3-3-70b as judge in Arena Hard benchmark supporting: - WML Inference Engine - Generic Inference Engine Signed-off-by: karolina.zrobek <[email protected]>

New metric definitions for llama-3-3-70b as judge in Arena Hard bench…

ee613cf

…mark * Added metric definitions for llama-3-3-70b as judge in Arena Hard benchmark supporting: - WML Inference Engine - Generic Inference Engine Signed-off-by: karolina.zrobek <[email protected]>

kmazrolina marked this pull request as draft October 27, 2025 13:46

kmazrolina marked this pull request as ready for review October 27, 2025 13:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New metric definitions for llama-3-3-70b as judge in Arena Hard benchmark #1949

New metric definitions for llama-3-3-70b as judge in Arena Hard benchmark #1949

Uh oh!

kmazrolina commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

New metric definitions for llama-3-3-70b as judge in Arena Hard benchmark #1949

Are you sure you want to change the base?

New metric definitions for llama-3-3-70b as judge in Arena Hard benchmark #1949

Uh oh!

Conversation

kmazrolina commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant