The Bug Busters: ChatGPT and Gemini's Journey in Program Repair

Overview

This project aims to independently benchmark the performance of various ChatGPT and Gemini models in debugging Finite Logic (FL) and Automatic Program Repair (APR) tasks. The investigation is divided into two main focus areas:

1. Model Benchmarking

Objective: To evaluate the performance of different ChatGPT and Gemini models in debugging tasks.
Process: Each model is independently tested to determine which performs best in terms of accuracy.

2. Multi-Agent System for Collaborative Debugging

Objective: To implement a multi-agent system where LLMs (Large Language Models) work together in collaborative conversations to carry out the debugging process.
Process: After identifying the best-performing models, they are integrated into a multi-agent framework. This system allows LLMs to collaborate in conversations aimed at improving debugging performance.
Outcome: Additional testing is conducted to determine if collaborative efforts between LLM agents lead to better debugging results compared to individual models.

Goals

The primary goals of this investigation are:

To assess the viability of LLMs in debugging tasks.
To explore whether collaborative conversations between multiple LLM agents can improve debugging performance.

Repository Structure

The code in this repository is organized into three distinct branches to streamline the benchmarking and evaluation process:

Gemini: Benchmarking different Gemini Models
ChatGPT: Benchmarking different GPT Models
Multi-Agent-Conversation: Evaluating the performance of the Multi-Agent System

Folder Overview

EvalGPTFix_Extracted & Gemini_Extracted
These folders contain the structured data extracted from the datasets used in benchmarking the models.
LLM_Scripts
This folder contains the Python scripts that implement the pipeline for prompting the LLMs and generating files to store their responses.
Responses
All responses generated by the LLMs are stored in sequential order. This includes:
- CSV files
- Code files produced during the debugging tasks
Tests
This folder contains the validation of the results using test cases.
Analysis
Test outcomes and the general analysis of each model’s performance can be found here.

Running the Multi-Agent System or Individual LLM Scripts

To run the multi-agent system or any of the individual LLM scripts, please follow the steps below:

1. Installations

Ensure that you have the following installed:

Python (Version 3.7+)
Download and install Python from here.
Java Download and install Python from here.
JUnit Download the JUnit VSCode extension to run the Java tests from here

Python Libraries
Install the required libraries by running:

  pip install openai
  pip install -q -U google-generativeai
  pip instal python-dotenv
  pip install pyautogen
  pip install pyautogen[gemini]
  pip install pyautogen[gemini,retrievechat,lmm]
  pip install pytest 
  pip install panel

2. .env Structure

Create a .env file in the root directory and include the following keys with your respective API credentials:

OPENAI_API_KEY=<Your OpenAI API Key Here>
GEMINI_API_KEY=<Your Gemini API Key Here>

3. OAI_CONFIG_LIST.json Structure

Create a OAI_CONFIG_LIST.json file in the root directory and include the following structures containing keys with your respective API credentials:

[
    {
        "model": "gpt-4o-mini",
        "api_key": "<Your API Key Here>"
    },
    {
        "model": "gpt-4o",
        "api_key": "<Your API Key Here>"
    },
    {
        "model": "gemini-1.5-pro",
        "api_key": "<Your API Key Here>",
        "api_type": "google"
    },
    {
        "model": "gemini-1.5-flash",
        "api_key": "<Your API Key Here>",
        "api_type": "google"
    }
]

Make sure to replace <Your API Key Here> with your actual API keys for each model.

4. Running the Multi-Agent System

To run the multi-agent system using the front-end panel interface, execute the following command:

panel serve Conversations_UI.py

Completed by Taahir Kolia and Muhammad Raees Dindar as part of the ELEN4012A Investigation

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Analysis		Analysis
Datasets		Datasets
Deprecated Content		Deprecated Content
EvalGPTFix_Extracted		EvalGPTFix_Extracted
LLM_Scripts		LLM_Scripts
QuixBugs_Extracted		QuixBugs_Extracted
Responses		Responses
Tests/Gemini		Tests/Gemini
coding		coding
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

The Bug Busters: ChatGPT and Gemini's Journey in Program Repair

Overview

1. Model Benchmarking

2. Multi-Agent System for Collaborative Debugging

Goals

Repository Structure

Folder Overview

Running the Multi-Agent System or Individual LLM Scripts

1. Installations

2. .env Structure

3. OAI_CONFIG_LIST.json Structure

4. Running the Multi-Agent System

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

MRaeesD/Bug-Busters

Folders and files

Latest commit

History

Repository files navigation

The Bug Busters: ChatGPT and Gemini's Journey in Program Repair

Overview

1. Model Benchmarking

2. Multi-Agent System for Collaborative Debugging

Goals

Repository Structure

Folder Overview

Running the Multi-Agent System or Individual LLM Scripts

1. Installations

2. .env Structure

3. OAI_CONFIG_LIST.json Structure

4. Running the Multi-Agent System

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages