Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General Plan #71

Open
egorsmkv opened this issue Feb 26, 2025 · 0 comments
Open

General Plan #71

egorsmkv opened this issue Feb 26, 2025 · 0 comments

Comments

@egorsmkv
Copy link
Owner

egorsmkv commented Feb 26, 2025

What we want:

  1. A massive corpus (~5k-10k hours) of Ukrainian speech from different domains: audiobooks, broadcast speech, room-speaking, online conferences, etc
  2. Train open-sourced models developers can easily use
  3. Evaluation datasets to check quality of already existing Speech-to-Text models: Evaluate Speech-to-Text models #52
  4. A test-machine for all STT/TTS models that generates JSONL files for automated evaluation (predictions + references, in STT case) with metadata (RTF, GPU card, etc). It should be a container-based project.
  5. Create leaderboards for STT and TTS tasks: Add Speech-to-Text leaderboard #60 Add Text-to-Speech leaderboard #63

How to achieve it:

Task 1:

  1. Create a dataset with pseudo labels using a multilingual ASR model (for example, Whisper)
  2. Filter out non-Ukrainian samples
  3. Align data using a CTC-based model to make a better dataset we can use in further modeling

Task 2:

  1. Fine-tune already existing models with aligned data

Task 3:

  1. Create more testsets in different domains

Task 4:

  1. Convert scripts from https://github.com/egorsmkv/speech-recognition-uk/tree/master/speech-to-text to containerized test-images

Task 5:

  1. Create them as tables with all metadata we need. The table should be automatically generated from JSON files made by the test-machine.
@egorsmkv egorsmkv pinned this issue Feb 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant