What's Changed
- Support for specifying model service API URL for evaluation: Evaluation can be performed on both local and remote model services.
- Support for custom schema for mixed data evaluation: Combine different datasets for a more comprehensive assessment of model -capabilities with less data.
- Add benchmark contribution guidelines: Users can add their own benchmarks to make the tool more powerful and beneficial for more people.
中文
- 支持指定模型服务API URL评测:不论是本地模型还是远端模型服务都可以评测
- 支持自定义schema进行数据混合评测:混合不同的数据集,用更少的数据,更全面的评估模型能力
- 添加benchmark贡献指南:可以自行添加benchmark,让工具变的更强大,让更多人受益
Full Changelog: v0.8.2...v0.9.0