Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MBPP Dataset Preprocessing for Code-Optimize #105

Open
JieWu02 opened this issue Feb 11, 2025 · 2 comments
Open

MBPP Dataset Preprocessing for Code-Optimize #105

JieWu02 opened this issue Feb 11, 2025 · 2 comments

Comments

@JieWu02
Copy link

JieWu02 commented Feb 11, 2025

Thank you for your promising work Code-Optimize. We greatly appreciate the effort you've put into it.

However, we’ve encountered some difficulty in reproducing the second step of annotation. Specifically, the MBPP dataset provided in the code contains the following keys:
[prompt, test, entry_point].

On the other hand, the MBPP datasets we have found online typically contain keys such as:
['task_id', 'text', 'code', 'test_list', 'test_setup_code'].

It seems there might be a missing or unclear preprocessing step that is causing this discrepancy. Could you kindly clarify this step for us, or point us in the right direction?

Looking forward to your response, and thank you once again for your valuable contributions.

Best regards,

@milangritta
Copy link

milangritta commented Feb 11, 2025

Hello, and thank you for taking interest in our work! 👍
Sorry about this, I forgot to upload our slightly formatted MBPP version.
This will be uploaded into the 'datasets' folder very soon. I hope this helps.

By the way, you can also use the finished datasets, which are in the 'datasets' folder.
Skip straight to the optimization step 🥇 Have a nice day!

@JieWu02
Copy link
Author

JieWu02 commented Feb 11, 2025

I see the updated MBPP data, great!
Have a nice day too!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants