Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Has Anyone Successfully Run the SWE-Lancer Benchmark? #49

Open
ShaneTian opened this issue Feb 28, 2025 · 5 comments
Open

Has Anyone Successfully Run the SWE-Lancer Benchmark? #49

ShaneTian opened this issue Feb 28, 2025 · 5 comments

Comments

@ShaneTian
Copy link

ShaneTian commented Feb 28, 2025

Description:
I am trying to run the SWE-Lancer benchmark on my system, but I would like to confirm if others have successfully completed the process.

System Details:

  • OS: CentOS Linux 7 64bit / Linux 4.14.0_1-0-0-51
  • CPU: Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz 3.10/0.00GHz, 96 core
  • MEM: 377GB
  • Disk: 8 * 7.3TB
  • Docker: 27.0.0-rc.2

Steps Taken

✅ Step 1: Environment Setup (Success)

uv sync
source .venv/bin/activate
for proj in nanoeval alcatraz nanoeval_alcatraz; do
  uv pip install -e project/"$proj"
done

✅ Step 2: Docker Build (Success)

docker buildx build \
  -f Dockerfile_x86 \
  --platform linux/amd64 \
  --ssh default=$SSH_AUTH_SOCK \
  --network host \
  -t swelancer \
  .

✅ Step 3: Running Container (Success)

Using ISSUE_ID=1 environment variable #44 (comment)

docker run -itd --name swelancer-runtime -p 5900:5900 -p 5901:5901 -e ISSUE_ID=1 swelancer

❓ Step 4: Running SWE-Lancer (Partial Failure)

uv run python run_swelancer.py --issue_ids 1 2 3 4 5 6 7 8 9 10 11

I used the gpt-4o-2024-11-20 model and ran the first 11 issues. Each question takes an average of 20 minutes to an hour.
There are 7 issues that can work properly (1, 2, 4, 5, 8, 10, 11), but all the results are failures. Other 4 issues do not work properly and will raise errors (3 6 7 9).
I am not sure if there is something wrong with my way.

Thanks in advance! 🚀

@Lucky-w0y
Copy link

Hello, I am also running the SWE-Lancer. However my user-tool can't not run successfully. Did you meet the error: bash: cannot set terminal process group (17647): Inappropriate ioctl for device
bash: no job control in this shell.when the agent calls the user-tool?

@BoxiYu
Copy link
Contributor

BoxiYu commented Mar 3, 2025

Hi guys, I have successfully run the Swelancer examples. If you are also using x86 architecture, you can download my pre-built Docker image here: https://hub.docker.com/repository/docker/cccav/swelancer_x86/general.

Wish you good luck!

@moresearch
Copy link

@BoxiYu thats great, could you please give a hint about the cost of running the examples? why x86?

@BoxiYu
Copy link
Contributor

BoxiYu commented Mar 4, 2025

@moresearch hi, I only run it with the two examples, about an average of 2 dollars maybe, I did not record it precisely. The image I provided at the link is built on an x86 cloud server, and I have smoothly used it on my x86 laptop. I did not build it successfully on my arm device (it might be due to the network error).

@moresearch
Copy link

@BoxiYu you think we could collectively as a community gather cost data per different model/agent-implementation? Would you be interested in participating in such endeavour?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants