Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare data for SFT #8

Merged
merged 12 commits into from
Oct 31, 2024
Merged

Prepare data for SFT #8

merged 12 commits into from
Oct 31, 2024

Conversation

chiffonng
Copy link
Owner

@chiffonng chiffonng commented Oct 30, 2024

Closes #5 Prepare data for supervised fine-tuning. Data can be viewed here

  • Log in Hugging Face hub (doc)
  • Train test split (train + validation)
  • Push data to HF hub
  • Load data from HF hub

@chiffonng chiffonng changed the title Close #5 Prepare data for SFT Closes #5 Prepare data for SFT Oct 30, 2024
@chiffonng chiffonng changed the title Closes #5 Prepare data for SFT Prepare data for SFT Oct 31, 2024
- Convert dataset into datasets.Dataset format
- Train test split the data
- Push data to HF hub
- Load data from HF hub
- Remove redundant functions
- Swap strings with constants
@chiffonng chiffonng marked this pull request as ready for review October 31, 2024 09:40
@chiffonng chiffonng merged commit 53b5489 into main Oct 31, 2024
1 check passed
@chiffonng chiffonng self-assigned this Oct 31, 2024
@chiffonng chiffonng added feature New feature High priority Created by Linear-GitHub Sync labels Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature High priority Created by Linear-GitHub Sync
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CAP-16] Prepare data for SFT
1 participant