Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TestGenEval benchmark #5534

Open
wants to merge 28 commits into
base: main
Choose a base branch
from
Open

Add TestGenEval benchmark #5534

wants to merge 28 commits into from

Conversation

kjain14
Copy link

@kjain14 kjain14 commented Dec 11, 2024

End-user friendly description of the problem this fixes or functionality that this introduces

Adds a new unit test generation benchmark TestGenEval: https://arxiv.org/abs/2410.00752


Give a summary of what the PR does, explaining any non-trivial design decisions

PR includes changes to measure:

  • Coverage
  • Mutation score
  • Push docker images for TestGenEval with testing dependencies
  • Prompts for measuring CodeAct performance
  • Wide range of lexical metrics too (rouge, codebleu, readability, etc)

Note: This is a clean version of PR #5534 that contains only the TestGenEval changes.

pyproject.toml Outdated Show resolved Hide resolved
@kjain14 kjain14 marked this pull request as ready for review December 16, 2024 19:55
@neubig neubig self-requested a review December 16, 2024 20:08
Copy link
Contributor

@neubig neubig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kjain14!

Sorry it took me a while to get to reviewing this. The README was still a bit WIP so I asked OpenHands to update it, and now I think it reflects the flow better. I was able to run the benchmark.

However, I'm a bit stuck for evaluation, I'm getting this when I try to install:

$ poetry install --with testgeneval
...
  Unable to find installation candidates for torch (2.5.1)
  - Installing importlib-resources (6.4.5)
  - Installing kubernetes (31.0.0)
  - Installing llama-cloud (0.1.5)
  - Installing llama-index-embeddings-openai (0.3.0)
  - Installing llama-index-program-openai (0.3.0)
  - Installing llama-parse (0.5.15)
  - Installing mmh3 (5.0.1)
  - Installing onnxruntime (1.20.1)
  - Installing opentelemetry-instrumentation-fastapi (0.46b0)
  - Installing orjson (3.10.12)
  - Installing posthog (3.7.2)
  - Installing pypdf (5.1.0)
  - Installing pypika (0.48.9)
  - Installing rapidfuzz (3.11.0)
  - Installing scikit-learn (1.5.2)
  - Installing striprtf (0.0.26)
  - Updating synchronicity (0.9.7 -> 0.9.8)
  - Installing torch (2.5.1): Failed

  RuntimeError

  Unable to find installation candidates for torch (2.5.1)
  - Updating botocore (1.35.84 -> 1.35.87)
  - Installing importlib-resources (6.4.5)
  - Installing kubernetes (31.0.0)
  - Installing llama-cloud (0.1.5)
  - Installing llama-index-embeddings-openai (0.3.0)
  - Installing llama-index-program-openai (0.3.0)
  - Installing llama-parse (0.5.15)
  - Installing mmh3 (5.0.1)
  - Installing onnxruntime (1.20.1)
  - Installing opentelemetry-instrumentation-fastapi (0.46b0)
  - Installing orjson (3.10.12)
  - Installing posthog (3.7.2)
  - Installing pypdf (5.1.0)
  - Installing pypika (0.48.9)
  - Installing rapidfuzz (3.11.0)
  - Installing scikit-learn (1.5.2)
  - Installing striprtf (0.0.26)
  - Updating synchronicity (0.9.7 -> 0.9.8)
  - Installing torch (2.5.1): Failed

  RuntimeError

  Unable to find installation candidates for torch (2.5.1)

  at ~/Library/Application Support/pypoetry/venv/lib/python3.13/site-packages/poetry/installation/chooser.py:74 in choose_for
       70│ 
       71│             links.append(link)
       72│ 
       73│         if not links:
    →  74│             raise RuntimeError(f"Unable to find installation candidates for {package}")
       75│ 
       76│         # Get the best link
       77│         chosen = max(links, key=lambda link: self._sort_key(package, link))
       78│ 

Cannot install torch.

Are you able to reproduce this? If so it'd be nice if you could figure out the incompatibility. If you can't repro I'll try to investigate further.

@kjain14
Copy link
Author

kjain14 commented Dec 27, 2024

Hmm, I tried today and am not able to reproduce this, wondering what may be causing this?

I think this also does not have to do with testgeneval dependencies (it is because of the llama group dependencies, which list torch==2.5.1)

@neubig neubig self-requested a review December 28, 2024 19:57
@neubig
Copy link
Contributor

neubig commented Dec 28, 2024

Hmm, I'll take another look.

@neubig neubig self-assigned this Dec 30, 2024
@neubig neubig changed the title Add TestGenEval Add TestGenEval benchmark Jan 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants