Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AI4Bharat TTS Model Inference for Tibetan (Continuing Post-Training) #3

Open
5 tasks done
tenzinchoedon opened this issue Jan 20, 2025 · 2 comments
Open
5 tasks done
Assignees

Comments

@tenzinchoedon
Copy link
Collaborator

tenzinchoedon commented Jan 20, 2025

Link to previous card: #2 (comment)

Description

This phase focuses on running inference on the fine-tuned AI4Bharat Indic-TTS model. The objective is to generate high-quality, natural-sounding Tibetan speech and validate the model's performance. Necessary debugging and adjustments will also be carried out if issues arise during inference.

Completion Criteria

  • Generate Tibetan speech audio files from textual inputs.
  • Resolve any issues related to phoneme mismatches or inference configurations.

Implementation

  1. Inference Execution
  • Run inference using the fine-tuned Tibetan TTS model and prepared Tibetan text input.
  • Save generated audio files for evaluation.
  1. Debugging
  • Investigate and resolve issues like unsupported phoneme languages (bo) or incorrect configurations.
  • Make adjustments to phoneme language or preprocessing as required.
  1. Evaluation
  • Evaluate the quality of the generated speech.
  • Compare the generated outputs with the expected results to spot any differences.

Subtasks

  • Validate inference setup and configurations.
  • Run inference on sample Tibetan text input.
  • Save and analyze generated audio outputs for debugging and quality assessment.
  • Document issues, if any, and implement required adjustments.

Card Reviewer

@tenzinchoedon tenzinchoedon self-assigned this Jan 20, 2025
@tenzinchoedon tenzinchoedon converted this from a draft issue Jan 20, 2025
@tenzinchoedon
Copy link
Collaborator Author

Wav files from training on Weights & Biases (WandB):

TrainAudio
EvalAudio

@tenzinchoedon
Copy link
Collaborator Author

@tenzinchoedon tenzinchoedon moved this from IN PROGRESS to DONE in STT & TTS Dev Mar 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: DONE
Development

No branches or pull requests

1 participant