Skip to content

Fixed typos #7572

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ Note that if any files were formatted by `pre-commit` hooks during committing, y
git push -u origin a-descriptive-name-for-my-changes
```

Go the webpage of your fork on GitHub. Click on "Pull request" to send your to the project maintainers for review.
Go the webpage of your fork on GitHub. Click on "Pull request" to send your changes to the project maintainers for review.

## Datasets on Hugging Face

Expand Down
2 changes: 1 addition & 1 deletion docs/source/stream.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,7 @@ Define sampling probabilities from each of the original datasets for more contro
{'text': 'Chevrolet Cavalier Usados en Bogota - Carros en Vent...'}]
```

Around 80% of the final dataset is made of the `en_dataset`, and 20% of the `fr_dataset`.
Around 80% of the final dataset is made of the `es_dataset`, and 20% of the `fr_dataset`.

You can also specify the `stopping_strategy`. The default strategy, `first_exhausted`, is a subsampling strategy, i.e the dataset construction is stopped as soon one of the dataset runs out of samples.
You can specify `stopping_strategy=all_exhausted` to execute an oversampling strategy. In this case, the dataset construction is stopped as soon as every samples in every dataset has been added at least once. In practice, it means that if a dataset is exhausted, it will return to the beginning of this dataset until the stop criterion has been reached.
Expand Down