Skip to content

MRaeesD/Bug-Busters

Repository files navigation

The Bug Busters: ChatGPT and Gemini's Journey in Program Repair

Overview

This project aims to independently benchmark the performance of various ChatGPT and Gemini models in debugging Finite Logic (FL) and Automatic Program Repair (APR) tasks. The investigation is divided into two main focus areas:

1. Model Benchmarking

  • Objective: To evaluate the performance of different ChatGPT and Gemini models in debugging tasks.
  • Process: Each model is independently tested to determine which performs best in terms of accuracy.

2. Multi-Agent System for Collaborative Debugging

  • Objective: To implement a multi-agent system where LLMs (Large Language Models) work together in collaborative conversations to carry out the debugging process.
  • Process: After identifying the best-performing models, they are integrated into a multi-agent framework. This system allows LLMs to collaborate in conversations aimed at improving debugging performance.
  • Outcome: Additional testing is conducted to determine if collaborative efforts between LLM agents lead to better debugging results compared to individual models.

Goals

The primary goals of this investigation are:

  1. To assess the viability of LLMs in debugging tasks.
  2. To explore whether collaborative conversations between multiple LLM agents can improve debugging performance.

Repository Structure

The code in this repository is organized into three distinct branches to streamline the benchmarking and evaluation process:

  • Gemini: Benchmarking different Gemini Models
  • ChatGPT: Benchmarking different GPT Models
  • Multi-Agent-Conversation: Evaluating the performance of the Multi-Agent System

Folder Overview

  • EvalGPTFix_Extracted & Gemini_Extracted
    These folders contain the structured data extracted from the datasets used in benchmarking the models.

  • LLM_Scripts
    This folder contains the Python scripts that implement the pipeline for prompting the LLMs and generating files to store their responses.

  • Responses
    All responses generated by the LLMs are stored in sequential order. This includes:

    • CSV files
    • Code files produced during the debugging tasks
  • Tests
    This folder contains the validation of the results using test cases.

  • Analysis
    Test outcomes and the general analysis of each model’s performance can be found here.


Running the Multi-Agent System or Individual LLM Scripts

To run the multi-agent system or any of the individual LLM scripts, please follow the steps below:

1. Installations

Ensure that you have the following installed:

  • Python (Version 3.7+)
    Download and install Python from here.

  • Java Download and install Python from here.

  • JUnit Download the JUnit VSCode extension to run the Java tests from here

  • Python Libraries
    Install the required libraries by running:

      pip install openai
      pip install -q -U google-generativeai
      pip instal python-dotenv
      pip install pyautogen
      pip install pyautogen[gemini]
      pip install pyautogen[gemini,retrievechat,lmm]
      pip install pytest 
      pip install panel

2. .env Structure

Create a .env file in the root directory and include the following keys with your respective API credentials:

OPENAI_API_KEY=<Your OpenAI API Key Here>
GEMINI_API_KEY=<Your Gemini API Key Here>

3. OAI_CONFIG_LIST.json Structure

Create a OAI_CONFIG_LIST.json file in the root directory and include the following structures containing keys with your respective API credentials:

[
    {
        "model": "gpt-4o-mini",
        "api_key": "<Your API Key Here>"
    },
    {
        "model": "gpt-4o",
        "api_key": "<Your API Key Here>"
    },
    {
        "model": "gemini-1.5-pro",
        "api_key": "<Your API Key Here>",
        "api_type": "google"
    },
    {
        "model": "gemini-1.5-flash",
        "api_key": "<Your API Key Here>",
        "api_type": "google"
    }
]

Make sure to replace <Your API Key Here> with your actual API keys for each model.

4. Running the Multi-Agent System

To run the multi-agent system using the front-end panel interface, execute the following command:

panel serve Conversations_UI.py

Completed by Taahir Kolia and Muhammad Raees Dindar as part of the ELEN4012A Investigation

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •