Skip to content

Create task for astroid PR-2496#177

Open
danielzayas wants to merge 2 commits intoSWE-bench:mainfrom
danielzayas:astroid-pr-2496
Open

Create task for astroid PR-2496#177
danielzayas wants to merge 2 commits intoSWE-bench:mainfrom
danielzayas:astroid-pr-2496

Conversation

@danielzayas
Copy link
Contributor

@danielzayas danielzayas commented Nov 29, 2025

Background

The SWE-bench/SWE-smith dataset on HuggingFace already contains 81 tasks from pylint-dev/astroid built via the PR Mirroring method. I wanted to try creating a new tasks for an existing repo, so I picked a PR that did not already have a task and followed the CONTRIBUTING.md instructions ➡️ proposed a new task.

RE "upload the zipped file as a new PR", I can't create a PR without a diff versus main, so I force-added unzipped SWE-smith/logs/ files to the commit + uploaded the corresponding ZIP files (linked below). Given the .gitignore, we don't actually want to merge the SWE-smith/logs/ changes to main.

Proposed Change

Create new task for astroid PR-2496 pylint-dev/astroid#2496 via the PR Mirroring method as per SWE-smith/docs/guides/create_instances.md.

https://github.com/SWE-bench/SWE-smith/blob/main/CONTRIBUTING.md#create-task-instances ➡️ "Zip this folder" ➡️ "bug_gen.zip" file:

bug_gen.zip

Test Plan

Ran swesmith/harness/valid.py ➡️ logs/run_validation/pylint-dev__astroid.b114f6b5/ output ➡️ "run_validation.zip" file output:

run_validation.zip

Ran swesmith/harness/eval.py ➡️ logs/run_evaluation/eval_astroid_2496/ output ➡️ "run_evaluation.zip" file output:

run_validation.zip

Ran swesmith/harness/gather.py ➡️ logs/task_insts/ ➡️ "tasks_insts.zip" file output:

task_insts.zip

Things I Learned (TIL)

(1) generate.py does not apply diff if > 1000 lines

SWE-smith/swesmith/bug_gen/mirror/generate.py couldn’t apply the original diff directly because one file is several thousand lines long. The mirroring flow only attempts a direct git apply if every edited file satisfies the heuristics inside should_attempt_recovery, including "No changed file is >1000 lines". Falling back to the “recovery” path was faulty because the LLM-produced revert missed logic ➡️ Fix was to raise that limit to 10 000 lines and re-run the pipeline, the generator was allowed to reuse the actual PR diff and the mirrored bug patch became bit-for-bit identical to the upstream fix ➡️ contribute the change back to SWE-smith upstream via #178.

(2) valid.py runs the test suite from the pylint-dev/astroid upstream, not the mirrored and pruned repo at danielzayas/astroid

Validation says, "The validation harness works in two steps. First, it 1) runs the original repository's test suite to get the passing statuses of the existing tests. Then, it 2) applies each candidate task instance to the repository and runs the test suite again". It seems like (1) runs the upstream test suite versus (2) runs the test suite after mirroring and pruning ➡️ test files might not be present in the latter post-pruning ➡️ the eval.py run was checking three PASS‑to‑PASS tests that never execute inside the container (tests/brain/test_nose.py::NoseBrainTest::test_nose_tools, tests/test_modutils.py::BackportStdlibNamesTest::test_import_error, tests/test_nodes.py::BoundMethodNodeTest::test_is_property). They appear in the valid.py logs from (1) but aren’t present in the mirror snapshot used for (2), so the grading script always marks them as missing. I removed those three entries from logs/task_insts/pylint-dev__astroid.b114f6b5.json’s PASS_TO_PASS list. If they're not pruned and included in the mirror, they can be added to the P2P test coverage.

@danielzayas danielzayas marked this pull request as ready for review November 29, 2025 04:55
@john-b-yang
Copy link
Member

john-b-yang commented Dec 10, 2025

Thanks so much @danielzayas! Just letting you know I saw this - will come back to this by the EOW.

@danielzayas
Copy link
Contributor Author

danielzayas commented Jan 6, 2026

I wanted to try creating a new tasks for an existing repo, so I picked a PR that did not already have a task and followed the CONTRIBUTING.md instructions ➡️ proposed a new task

I created this PR mainly to learn about the repo and workflow described in CONTRIBUTING.md ➡️ totally fine to close this without merging to main. I don't expect that particular task instance to be world-changing lol.

The primary positive outcomes of this seem to be

  1. Fixed "(1) generate.py does not apply diff if > 1000 lines" via increase max file size for diff apply #178
  2. Identified issue "(2) valid.py runs the test suite from the pylint-dev/astroid upstream, not the mirrored and pruned repo at danielzayas/astroid" ➡️ I opened issue valid.py runs test suite from upstream repo, not the mirrored and pruned repo #194

TLDR: we can merge if we want the new task and just close if not

And thanks again for the awesome open source project!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants