Speech Data Generation

A quick script to generate audio speech data using NVIDIA's Tacotron 2 and WaveGlow models.

Pre-requisites

NVIDIA GPU + CUDA cuDNN

Setup

Clone this repo: git clone https://github.com/azhou314/speech-data-generation.git
CD into this repo: cd speech-data-generation
Initialize WaveGlow submodule: git submodule init; git submodule update
Download pretrained Tacotron and WaveGlow models from NVIDIA and place into the repo

Data generation

Create a .csv file of the desired speech data. The file should have two columns. The first column should be of the words/phrases to be generated (without punctuation), and the second column should contain the number of times to sample each word or phrase.

Words or phrases can be specified in conventional English orthography, or in ARPABET

To specify words/phrases in ARPABET, surround with curly braces and use 2-letter codes:

The list of valid 2-letter codes is found below, where numbers are appended to vowels to signify stress:

valid_symbols = ['AA', 'AA0', 'AA1', 'AA2', 'AE', 'AE0', 'AE1', 'AE2', 'AH', 'AH0', 'AH1', 'AH2',
                 'AO', 'AO0', 'AO1', 'AO2', 'AW', 'AW0', 'AW1', 'AW2', 'AY', 'AY0', 'AY1', 'AY2',
                 'B', 'CH', 'D', 'DH', 'EH', 'EH0', 'EH1', 'EH2', 'ER', 'ER0', 'ER1', 'ER2', 'EY',
                 'EY0', 'EY1', 'EY2', 'F', 'G', 'HH', 'IH', 'IH0', 'IH1', 'IH2', 'IY', 'IY0', 'IY1',
                 'IY2', 'JH', 'K', 'L', 'M', 'N', 'NG', 'OW', 'OW0', 'OW1', 'OW2', 'OY', 'OY0',
                 'OY1', 'OY2', 'P', 'R', 'S', 'SH', 'T', 'TH', 'UH', 'UH0', 'UH1', 'UH2', 'UW',
                 'UW0', 'UW1', 'UW2', 'V', 'W', 'Y', 'Z', 'ZH']

The corresponding IPA symbols for each code can be found here

An example of a valid .csv is found in input.csv

Run python data_generation.py INPUT_FILE.csv OUTPUT_FOLDER_LOCATION
- By default, the script will look at input.csv and output data directly into the /data directory
  - Otherwise, the generated data will be found in /data/OUTPUT_FOLDER_LOCATION
- An example of pre-generated data using input.csv is found in the /data directory

Acknowledgements

This code exclusively uses code from NVIDIA's Tacotron 2 implemention

Name		Name	Last commit message	Last commit date
Latest commit History 141 Commits
data		data
filelists		filelists
text		text
waveglow @ 5bc2a53		waveglow @ 5bc2a53
.gitignore		.gitignore
.gitmodules		.gitmodules
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
audio_processing.py		audio_processing.py
data_generation.py		data_generation.py
data_utils.py		data_utils.py
demo.wav		demo.wav
distributed.py		distributed.py
hparams.py		hparams.py
inference.ipynb		inference.ipynb
input.csv		input.csv
layers.py		layers.py
logger.py		logger.py
loss_function.py		loss_function.py
loss_scaler.py		loss_scaler.py
model.py		model.py
multiproc.py		multiproc.py
plotting_utils.py		plotting_utils.py
requirements.txt		requirements.txt
stft.py		stft.py
tensorboard.png		tensorboard.png
test.csv		test.csv
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech Data Generation

Pre-requisites

Setup

Data generation

Acknowledgements

About

Releases

Packages

Languages

License

atzhou8/speech-data-generation

Folders and files

Latest commit

History

Repository files navigation

Speech Data Generation

Pre-requisites

Setup

Data generation

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages