✨ feat: introduce mcpmark-agent #178

xyliugo · 2025-09-01T10:14:44Z

Change Type

Description of Change

Switched to LiteLLM, which is simpler and more flexible and general than the OpenAI Agents SDK for testing MCP capabilities.
Started building a new agent (mcpmark-agent) on top of LiteLLM.
Dropped Stream mode since we already handle looped tool-calling.
Added reasoning_effort support (comes built-in with LiteLLM).
Got rid of the heavy OpenAI Agents SDK dependency.
Handled Anthropic extended thinking + tool use logic

Additional Information

Co-authored-by: zjwu0522 <[email protected]>

…tion success as final success

github-actions · 2025-09-01T10:21:52Z

🐳 Docker Build Completed!

Version: pr-dev-9d99679
Build Time: 2025-09-01T10:35:29.572Z
🔗 View all tags on Docker Hub: https://hub.docker.com/r/evalsysorg/mcpmark/tags

Pull Image

Download the Docker image to your local machine:

docker pull evalsysorg/mcpmark:pr-dev-9d99679

Run Eval

Execute evaluation tasks using the built image:

DOCKER_IMAGE_VERSION=pr-dev-9d99679 ./run-task.sh --models gpt-4.1-mini --tasks file_context/uppercase

Important

This build is for testing and validation purposes.

arvinxx

lgtm

xyliugo and others added 16 commits August 31, 2025 14:04

feat: replace openai agents with litellm + looped tool calling (#166)

b547e57

Co-authored-by: zjwu0522 <[email protected]>

feat: reconstruct log file structure (#168)

4d5ac6a

fix: handle null reasoning token (#169)

7c1a043

fix: decrease workflow waiting time, fix max turn raise, use verifica…

457b3fe

…tion success as final success

fix: safe result reporter (#172)

5bccba4

✨ feat: improve mcpmark-agent (#173)

e7ec41c

feat: add litellm dependency (#171)

434cd62

feat: improve notion tool calling in gemini (#174)

9725d5a

feat: improve output tokens calculation (#175)

05ea89c

fix: correct model run name (#176)

92ca25a

minor

f3d9e2d

minor

5a6093c

chore: update readme

b0770df

Merge branch 'dev' of https://github.com/eval-sys/MCPArena into dev

b99bafc

minor

cfa5e43

minor

1113a2a

xyliugo changed the title ~~✨ feat:~~ ✨ feat: introduce mcpmark-agent Sep 1, 2025

Merge branch 'main' into dev

92c65d4

zjwu0522 requested a review from arvinxx September 1, 2025 10:17

zjwu0522 added the Build Docker label Sep 1, 2025

chore: remove bad reasoning token logic in claude

05bb00b

arvinxx approved these changes Sep 1, 2025

View reviewed changes

xyliugo merged commit f350cd4 into main Sep 1, 2025
3 checks passed

xyliugo deleted the dev branch September 1, 2025 13:02

zjwu0522 mentioned this pull request Sep 26, 2025

[Feature Request] Add separate leaderboard tracks for "Thinking" and "Non-Thinking" evaluations #157

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

✨ feat: introduce mcpmark-agent #178

✨ feat: introduce mcpmark-agent #178

Uh oh!

xyliugo commented Sep 1, 2025

Uh oh!

github-actions bot commented Sep 1, 2025 •

edited

Loading

Uh oh!

arvinxx left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

✨ feat: introduce mcpmark-agent #178

✨ feat: introduce mcpmark-agent #178

Uh oh!

Conversation

xyliugo commented Sep 1, 2025

Change Type

Description of Change

Additional Information

Uh oh!

github-actions bot commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🐳 Docker Build Completed!

Pull Image

Run Eval

Uh oh!

arvinxx left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions bot commented Sep 1, 2025 •

edited

Loading