Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to choose columns from dataset #2828

Open
etemiz opened this issue Dec 31, 2024 · 2 comments
Open

Ability to choose columns from dataset #2828

etemiz opened this issue Dec 31, 2024 · 2 comments

Comments

@etemiz
Copy link

etemiz commented Dec 31, 2024

Describe the feature

I want to be able to choose specific column (gt_response) because response column is for thinking process.

Dataset: https://huggingface.co/datasets/yingyingzhang/metamath-qwen2-math

Paste any useful information

When I add this to dataset-info.json:

"metamath": {
"hf_dataset_id": "yingyingzhang/metamath-qwen2-math",
"split": ["validation"],
"columns": {
"gt_response": "response"
}
},

I get the following error.

ValueError: New column name response already in the dataset. Please choose a column name which is not already in the dataset. Current columns in the dataset: ['prompt', 'response', 'gt_response', 'gold_ans', 'pred_ans', 'type', 'score', 'query']

Is there a way to overwrite existing columns? Or specifying what to download from hf or what to include in training?

Thanks!

@Jintao-Huang
Copy link
Collaborator

You might be able to do it this way:

"columns": {
"gt_response": "response", "response": "_"
}

@etemiz
Copy link
Author

etemiz commented Jan 2, 2025

this worked

"columns": {
"response": "-",
"gt_response": "response"
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants