Code for master thesis "Using Large Language Models for Binary Decompilation"

Python code project for my master thesis which can be found here: Using Large Language Models for Binary Decompilation

Setup

The following packages are needed:

radare2
python3
pip3
pkg-config # Required to install r2ghidra

TODO: Add more packages

In addition a .env file should be created with an OpenAI API key, looking like:

export OPENAI_API_KEY="key here"

Setting up and installing python requirements:

python -m venv .venv
source init_shell.sh
pip3 install -r requirements.txt

Sourcing init_shell.sh has to be done each time you start a new terminal.

For the tests:

pip3 install -r requirements_dev.txt

Running

To see how to use the main.py program can be used to run the pipeline use python main.py -h:

usage: main.py [-h] [-d DATA_DIR] [-r RESULTS] [-m MODEL] [-s] [-o] {clean,prepare,print,evaluate}

Run pipeline commands.

positional arguments:
  {clean,prepare,print,evaluate}
                        Command to execute

options:
  -h, --help            show this help message and exit
  -d DATA_DIR, --data-dir DATA_DIR
                        Data directory
  -r RESULTS, --results RESULTS
                        Results filename
  -m MODEL, --model MODEL
                        Model name
  -s, --strip           Strip the binary during compilation
  -o, --objdump         Use objdump instead of r2 for disassembly

For the most simple use case, using the 4 sources in data/ (default directory), run python main.py evaluate

For decompile-eval tests, first run python extract_decompile-eval.py and then python main.py -d data_decompile_eval evaluate. The first scripts downloads and creates the separate source directory.

Testing

Run ptw -- --cov=src --cov-report=term-missing to contiously test and produce coverage tests.

Name		Name	Last commit message	Last commit date
Latest commit History 130 Commits
.github/workflows		.github/workflows
sources		sources
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
codebleu_example_cases.py		codebleu_example_cases.py
copy_evaluation_files.py		copy_evaluation_files.py
extract_decompile-eval.py		extract_decompile-eval.py
init_shell.sh		init_shell.sh
main.py		main.py
print_result_statistics.py		print_result_statistics.py
pytest.ini		pytest.ini
readme.md		readme.md
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt
run_all_for_report.sh		run_all_for_report.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code for master thesis "Using Large Language Models for Binary Decompilation"

Setup

Running

Testing

About

Releases

Packages

Languages

License

mavaa/master-thesis-code

Folders and files

Latest commit

History

Repository files navigation

Code for master thesis "Using Large Language Models for Binary Decompilation"

Setup

Running

Testing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages