Change the repository type filter
All
Repositories list
49 repositories
SWE-bench-server
PublicTerminal-Bench-server
PublicSearchAgentService
Public- Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
CNFinBench
Publicopencompass
PublicOpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets…GenEditEvalKit
Publicpinchbench_server
PublicTextEdit
PublicGTA
Public[NeurIPS 2024 D&B Track] GTA: A Benchmark for General Tool AgentsMiroFlow
PublicRePro
Public[ICLR 2026] Rectifying LLM Thought From Lens of OptimizationSAGA
PublicATLAS
PublicOASIS
PublicInteractScience
PublicCognitiveKernel-Pro
PublicGAOKAO-Eval
Public.github
PublicMMBench-GUI
PublicOfficial repo of "MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents". It can be used to evaluate a GUI agent with a hierarchical mann…ReasonZoo
PublicCompassVerifier
PublicGPassK
Public[ACL 2025] Are Your LLMs Capable of Stable Reasoning?Creation-MMBench
PublicCompassJudger
PublicRaML
PublicBotChat
PublicAda-LEval
PublicThe official implementation of "Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks"MathBench
PublicMMBench
Public
ProTip! When viewing an organization's repositories, you can use the
props. filter to filter by custom property.