GitHub - mozilla-ai/speech-to-text-finetune: Blueprint by Mozilla.ai for finetuning a Speech-To-Text model in your own language

This blueprint enables you to create your own Speech-to-Text / Automatic Speech Recognition (ASR) dataset, or use the Common Voice dataset, to finetune an ASR model to improve performance for your specific language & use-case. All of this can be done locally (even on your laptop!) ensuring no data leaves your machine, safeguarding your privacy. Using Common Voice as a backbone enables this blueprint to support an impressively wide variety of languages! More the exact list of languages supported please visit the Common Voice website.

📘 To explore this project further and discover other Blueprints, visit the Blueprints Hub.

👉 📖 For more detailed guidance on using this project, please visit our Docs here

Built with

Quick-start

Note: All scripts should be executed from the root directory of the repository.

This blueprint consists of three independent, yet complementary, components:

Transcription app: A simple UI that lets you record your voice, pick any HF ASR model, and get an instant transcription.
Dataset maker app: Another UI app that enables you to easily and quickly create your own Speech-to-Text dataset.
Finetuning script: A script to finetune your own STT model, either using Common Voice data or your own local data created by the Dataset maker app.

Suggested flow for this repository

Use a virtual environment and install dependencies: pip install -e . & ffmpeg e.g. for Ubuntu: sudo apt install ffmpeg, for Mac: brew install ffmpeg
Try existing transcription HF models on your own language & voice locally: python demo/transcribe_app.py
If you are not happy with the results, you can finetune a model with data of your language from Common Voice
1. Configure config.yaml with the model, Common Voice dataset id from HF and hyperparameters of your choice.
2. Finetune a model: python src/speech_to_text_finetune/finetune_whisper.py
Try again the transcription app with your newly finetuned model.
If the results are still not satisfactory, create your own Speech-to-Text dataset and model.
1. Create a dataset: python demo/make_local_dataset_app.py
2. Configure config.yaml with the model, local data directory and hyperparameters of your choice.
3. Finetune a model: python src/speech_to_text_finetune/finetune_whisper.py
Finally try again the transcription app with the new model finetuned specifically for your own voice!

License

This project is licensed under the Apache 2.0 License. See the LICENSE file for details.

Contributing

Contributions are welcome! To get started, you can check out the CONTRIBUTING.md file.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github		.github
demo		demo
docs		docs
images		images
src/speech_to_text_finetune		src/speech_to_text_finetune
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

👉 📖 For more detailed guidance on using this project, please visit our Docs here

Built with

Quick-start

Suggested flow for this repository

License

Contributing

About

Releases

Packages

Contributors 2

Languages

License

mozilla-ai/speech-to-text-finetune

Folders and files

Latest commit

History

Repository files navigation

👉 📖 For more detailed guidance on using this project, please visit our Docs here

Built with

Quick-start

Suggested flow for this repository

License

Contributing

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages