Add TestGenEval benchmark #5534

kjain14 · 2024-12-11T22:08:38Z

End-user friendly description of the problem this fixes or functionality that this introduces

Adds a new unit test generation benchmark TestGenEval: https://arxiv.org/abs/2410.00752

Give a summary of what the PR does, explaining any non-trivial design decisions

PR includes changes to measure:

Coverage
Mutation score
Push docker images for TestGenEval with testing dependencies
Prompts for measuring CodeAct performance
Wide range of lexical metrics too (rouge, codebleu, readability, etc)

Note: This is a clean version of PR #5534 that contains only the TestGenEval changes.

Merge OpenHands

merging openhands

pyproject.toml

…-main

neubig

Thanks @kjain14!

Sorry it took me a while to get to reviewing this. The README was still a bit WIP so I asked OpenHands to update it, and now I think it reflects the flow better. I was able to run the benchmark.

However, I'm a bit stuck for evaluation, I'm getting this when I try to install:

$ poetry install --with testgeneval
...
  Unable to find installation candidates for torch (2.5.1)
  - Installing importlib-resources (6.4.5)
  - Installing kubernetes (31.0.0)
  - Installing llama-cloud (0.1.5)
  - Installing llama-index-embeddings-openai (0.3.0)
  - Installing llama-index-program-openai (0.3.0)
  - Installing llama-parse (0.5.15)
  - Installing mmh3 (5.0.1)
  - Installing onnxruntime (1.20.1)
  - Installing opentelemetry-instrumentation-fastapi (0.46b0)
  - Installing orjson (3.10.12)
  - Installing posthog (3.7.2)
  - Installing pypdf (5.1.0)
  - Installing pypika (0.48.9)
  - Installing rapidfuzz (3.11.0)
  - Installing scikit-learn (1.5.2)
  - Installing striprtf (0.0.26)
  - Updating synchronicity (0.9.7 -> 0.9.8)
  - Installing torch (2.5.1): Failed

  RuntimeError

  Unable to find installation candidates for torch (2.5.1)
  - Updating botocore (1.35.84 -> 1.35.87)
  - Installing importlib-resources (6.4.5)
  - Installing kubernetes (31.0.0)
  - Installing llama-cloud (0.1.5)
  - Installing llama-index-embeddings-openai (0.3.0)
  - Installing llama-index-program-openai (0.3.0)
  - Installing llama-parse (0.5.15)
  - Installing mmh3 (5.0.1)
  - Installing onnxruntime (1.20.1)
  - Installing opentelemetry-instrumentation-fastapi (0.46b0)
  - Installing orjson (3.10.12)
  - Installing posthog (3.7.2)
  - Installing pypdf (5.1.0)
  - Installing pypika (0.48.9)
  - Installing rapidfuzz (3.11.0)
  - Installing scikit-learn (1.5.2)
  - Installing striprtf (0.0.26)
  - Updating synchronicity (0.9.7 -> 0.9.8)
  - Installing torch (2.5.1): Failed

  RuntimeError

  Unable to find installation candidates for torch (2.5.1)

  at ~/Library/Application Support/pypoetry/venv/lib/python3.13/site-packages/poetry/installation/chooser.py:74 in choose_for
       70│ 
       71│             links.append(link)
       72│ 
       73│         if not links:
    →  74│             raise RuntimeError(f"Unable to find installation candidates for {package}")
       75│ 
       76│         # Get the best link
       77│         chosen = max(links, key=lambda link: self._sort_key(package, link))
       78│ 

Cannot install torch.

Are you able to reproduce this? If so it'd be nice if you could figure out the incompatibility. If you can't repro I'll try to investigate further.

kjain14 · 2024-12-27T18:32:45Z

Hmm, I tried today and am not able to reproduce this, wondering what may be causing this?

I think this also does not have to do with testgeneval dependencies (it is because of the llama group dependencies, which list torch==2.5.1)

neubig · 2024-12-28T19:57:30Z

Hmm, I'll take another look.

Kush Dave Jain and others added 13 commits November 29, 2024 17:57

initial TestGenEval code

7a4729c

Initial pass for TestGenEval

c6206f5

Licensing

280baa2

Readability metrics

75fba59

Fixing testing dependencies

f7f2531

Add option for starting point

30197e6

Cleaning to not OOM

791b7f9

Merge pull request #1 from All-Hands-AI/main

bd66d09

Merge OpenHands

TestGenEval MVP

585dba9

+ mutation testing

3af6025

Merge pull request #2 from All-Hands-AI/main

b19f735

merging openhands

Update README

7c81deb

Merge branch 'main' of https://github.com/kjain14/OpenHands

2cd64bc

neubig reviewed Dec 11, 2024

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

Kush Dave Jain added 3 commits December 12, 2024 15:50

reset

b685c67

testgeneval deps

fb9bc87

Final update, now working on all projects

77a153e

kjain14 marked this pull request as ready for review December 16, 2024 19:55

neubig self-requested a review December 16, 2024 20:08

openhands-agent and others added 3 commits December 25, 2024 21:23

Update TestGenEval README with comprehensive information

3401bd6

Merge branch 'main' of github.com:All-Hands-AI/OpenHands into kjain14…

b47da9e

…-main

Update lock file

90422e5

neubig reviewed Dec 25, 2024

View reviewed changes

neubig self-requested a review December 28, 2024 19:57

neubig self-assigned this Dec 30, 2024

Kush Dave Jain added 3 commits January 8, 2025 18:04

Any and all pass

31b6967

Reset to normal time

1ded123

Refine postprocessing

efb525a

Kush Dave Jain and others added 5 commits January 10, 2025 18:06

Refine prompt

219a134

Update prompt

3f0f13d

Update filtering

d1e8409

Only top level filtering

8848e60

Merge branch 'main' of github.com:kjain14/OpenHands into kjain14-main

3355bae

neubig changed the title ~~Add TestGenEval~~ Add TestGenEval benchmark Jan 20, 2025

More updates

9f9a65c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TestGenEval benchmark #5534

Add TestGenEval benchmark #5534

kjain14 commented Dec 11, 2024 •

edited by neubig

Loading

neubig left a comment

kjain14 commented Dec 27, 2024 •

edited

Loading

neubig commented Dec 28, 2024

Add TestGenEval benchmark #5534

Are you sure you want to change the base?

Add TestGenEval benchmark #5534

Conversation

kjain14 commented Dec 11, 2024 • edited by neubig Loading

neubig left a comment

Choose a reason for hiding this comment

kjain14 commented Dec 27, 2024 • edited Loading

neubig commented Dec 28, 2024

kjain14 commented Dec 11, 2024 •

edited by neubig

Loading

kjain14 commented Dec 27, 2024 •

edited

Loading