Skip to content
/ ReAPR Public

Here is the open-source code repository for the paper "ReAPR: Automatic Program Repair via Retrieval-Augmented Large Language Models."

Notifications You must be signed in to change notification settings

ZXLiu/ReAPR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ReAPR

Here is the open-source code repository for the paper "ReAPR: Automatic Program Repair via Retrieval-Augmented Large Language Models."

It is structured as follows:

  • BM25 contains the specific implementation of the BM25 algorithm.
  • DPR contains the specific implementation of the DPR algorithm, including model training, word vector embedding, and similarity retrieval.
  • Dataset contains the processing logic for two benchmarks as well as the patch validation logic on these two benchmarks.
  • Defects4j contains the single-function bugs we extracted from Defects4J 2.0.
  • Gitbug-java contains the single-function bugs we extracted from GitBug-Java.
  • reapir.py contains the entire repair process and serves as the main program entry point.
  • requirements.txt contains the dependencies that need to be installed for the project to run.

Guide

1.Install Dependencies

To run this code, you first need to install the project dependencies and two benchmarks: Defects4J and GitBug-Java.
To install the dependencies, run the following command:

pip install -r requirements.txt

2.BM25 Algorithm Implementation

After downloading the retrieval repository to a suitable local directory, run the following command to execute the BM25 algorithm.

cd BM25
python3 search_bm25.py --search_corpus your retrieval path --query_str the string to be retrieved --same_name results save location --temp_path temp path

3.DPR Algorithm Implementation

First, run the following command to train a dense retriever.

cd DPR
accelerate launch Train_dpr_retriever.py

Then, use the trained dense retriever to embed the fix content of each bug in the corpus into vectors, forming a vector retrieval database.

accelerate launch fix2embedding.py --fix_path retrieval corpus path --pretrained_model_path your dense retriever --output_dir embedding corpus path

Finally, run the following command to execute the DPR algorithm.

python3 search_dpr.py --fix_path retrieval corpus path --bug_str the string to be retrieved --top_k top_k --embedding_dir embedding corpus path --pretrained_model_path your dense retriever

4.ReAPR Workflow Implementation

Running the following command will execute the complete ReAPR workflow.

python3 repair.py --model_name generative model name --batch_size batch_size --dataset defects4j or githubjava --retrieval_way bm25 or dpr or no retrieval --chances beam search count

Benchmarks

Before running the program, please make sure to configure Defects4J and GitBug-Java properly.

Retrieval Corpus

The retrieval database supporting the results of this study is openly available in the HuggingFace repository. The dataset can be accessed at: https://huggingface.co/datasets/zxliu/ReAPR-Automatic-Program-Repair-via-Retrieval-Augmented-Large-Language-Models

About

Here is the open-source code repository for the paper "ReAPR: Automatic Program Repair via Retrieval-Augmented Large Language Models."

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages