Skip to content

InternLM/SWE-Fixer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution

Chengxing Xie*,1,2, Bowen Li*,1, Chang Gao*,1,3, He Du1, Wai Lam3, Difan Zou4, Kai Chen^,1

1Shanghai AI Laboratory, 2Xidian University, 3The Chinese University of Hong Kong, 4The University of Hong Kong
*Equal contribution, ^Corresponding author

📄 Paper

teaser

SWE-Fixer is a simple yet effective solution for addressing real-world GitHub issues by training open-source LLMs. It features a streamlined retrieve-then-edit pipeline with two core components:
🔍 A Code File Retriever and ✏️ A Code Editor.

For implementation, we finetune Qwen2.5-7b and Qwen2.5-72b for the retriever and the editor respectively, leveraging a curated dataset of 110K instances. SWE-Fixer achieves state-of-the-art performance among open-source solutions with open-source models achieving:

  • 🔹 23.3% on SWE-Bench Lite
  • 🔹 30.2% on SWE-Bench Verified

Models and Datasets

Models:

  • 🤗 SWE-Fixer-Retriever-7B
    🔍 Finetuned for the code file retrieval task, this model takes issue descriptions and BM25-retrieved results as input and identifies the defective files related to the issue.

  • 🤗 SWE-Fixer-Editor-72B
    ✏️ Designed for the code editing task, this model processes issue descriptions and corresponding file content to generate modification patches for resolving the issue.

Datasets:

  • 🤗 SWE-Fixer-Train-110K
    📂 This dataset contains nearly 110K detailed instances collected from real-world GitHub repositories, forming the training data of our model training pipeline.

  • 🤗 SWE-Fixer-Eval
    📊 This evaluation dataset includes SWE-Bench Lite and Verified instance, BM25-retrieval results for SWE-Bench Lite and Verified, and code structure for each instance, enabling convenient evaluation.

Run the Pipeline

Environment Setup

Download and install our inference environment package SWE_Fixer.tar.gz. Use the following commands:

mkdir {your_conda_environment_dir/SWE_Fixer}
tar -xzvf SWE_Fixer.tar.gz -C {your_conda_environment_dir/SWE_Fixer}

Activate the environment:

conda activate SWE_Fixer

Prepare Models and Evaluation Datasets

Download the models and datasets and save them to the default locations:

mkdir model
huggingface-cli login
huggingface-cli download --resume-download internlm/SWE-Fixer-Retriever-7B --local-dir ./model/retrieval_model
huggingface-cli download --resume-download internlm/SWE-Fixer-Editor-72B --local-dir ./model/editing_model
huggingface-cli download internlm/SWE-Fixer-Eval --repo-type dataset --local-dir ./eval_data

Alternatively, specify paths in the scripts by modifying MODEL_DIR in scripts/run_evaluation.sh and EVAL_DATA_DIR in scripts/run_evaluation.sh.

Run the Retrieval Model

Run the retrieval pipeline (default to the lite dataset):

scripts/run_evaluation.sh --mode retrieval

To use the verified dataset, execute:

scripts/run_evaluation.sh --mode retrieval --dataset verified

Retrieval results will be saved in the result directory.

Run the Editing Model

After completing the retrieval step, run the editing pipeline based on the retrieval results:

scripts/run_evaluation.sh --mode editing

To use the verified dataset, execute:

scripts/run_evaluation.sh --mode editing --dataset verified

Editing results will also be saved in the result directory.

Evaluate the Results

We evaluate the pipeline results using the all-hands evaluation approach. Refer to the evaluation guide here: Evaluation Guide Link.

Notes

  • Ensure all scripts are executable. Use chmod +x if necessary.
  • The exact paths for scripts and datasets must be updated to match your local setup.
  • If you encounter issues during deployment or execution, refer to the respective repositories and documentation.
  • The inference results may vary depending on your device or settings.

Citation

@article{xie2025swefixer,
  title={SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution}, 
  author={Xie, Chengxing and Li, Bowen and Gao, Chang and Du, He and Lam, Wai and Zou, Difan and Chen, Kai},
  journal={arXiv preprint arXiv:2501.05040},
  year={2025}
}

Acknowledgements