Skip to content

Commit 58610b1

Browse files
authored
Provides README.md for TTS recipes (#1491)
* Update README.md
1 parent 2f102eb commit 58610b1

File tree

2 files changed

+75
-0
lines changed

2 files changed

+75
-0
lines changed

egs/ljspeech/TTS/README.md

+38
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# Introduction
2+
3+
This is a public domain speech dataset consisting of 13,100 short audio clips of a single speaker reading passages from 7 non-fiction books.
4+
A transcription is provided for each clip.
5+
Clips vary in length from 1 to 10 seconds and have a total length of approximately 24 hours.
6+
7+
The texts were published between 1884 and 1964, and are in the public domain.
8+
The audio was recorded in 2016-17 by the [LibriVox](https://librivox.org/) project and is also in the public domain.
9+
10+
The above information is from the [LJSpeech website](https://keithito.com/LJ-Speech-Dataset/).
11+
12+
# VITS
13+
14+
This recipe provides a VITS model trained on the LJSpeech dataset.
15+
16+
Pretrained model can be found [here](https://huggingface.co/Zengwei/icefall-tts-ljspeech-vits-2024-02-28).
17+
18+
For tutorial and more details, please refer to the [VITS documentation](https://k2-fsa.github.io/icefall/recipes/TTS/ljspeech/vits.html).
19+
20+
The training command is given below:
21+
```
22+
export CUDA_VISIBLE_DEVICES=0,1,2,3
23+
./vits/train.py \
24+
--world-size 4 \
25+
--num-epochs 1000 \
26+
--start-epoch 1 \
27+
--use-fp16 1 \
28+
--exp-dir vits/exp \
29+
--max-duration 500
30+
```
31+
32+
To inference, use:
33+
```
34+
./vits/infer.py \
35+
--exp-dir vits/exp \
36+
--epoch 1000 \
37+
--tokens data/tokens.txt
38+
```

egs/vctk/TTS/README.md

+37
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# Introduction
2+
3+
This CSTR VCTK Corpus includes speech data uttered by 110 English speakers with various accents. Each speaker reads out about 400 sentences, which were selected from a newspaper, the rainbow passage and an elicitation paragraph used for the speech accent archive.
4+
The newspaper texts were taken from Herald Glasgow, with permission from Herald & Times Group. Each speaker has a different set of the newspaper texts selected based a greedy algorithm that increases the contextual and phonetic coverage.
5+
The details of the text selection algorithms are described in the following paper: [C. Veaux, J. Yamagishi and S. King, "The voice bank corpus: Design, collection and data analysis of a large regional accent speech database,"](https://doi.org/10.1109/ICSDA.2013.6709856).
6+
7+
The above information is from the [CSTR VCTK website](https://datashare.ed.ac.uk/handle/10283/3443).
8+
9+
# VITS
10+
11+
This recipe provides a VITS model trained on the VCTK dataset.
12+
13+
Pretrained model can be found [here](https://huggingface.co/zrjin/icefall-tts-vctk-vits-2023-12-05), note that this model was pretrained on the Edinburgh DataShare VCTK dataset.
14+
15+
For tutorial and more details, please refer to the [VITS documentation](https://k2-fsa.github.io/icefall/recipes/TTS/vctk/vits.html).
16+
17+
The training command is given below:
18+
```
19+
export CUDA_VISIBLE_DEVICES="0,1,2,3"
20+
./vits/train.py \
21+
--world-size 4 \
22+
--num-epochs 1000 \
23+
--start-epoch 1 \
24+
--use-fp16 1 \
25+
--exp-dir vits/exp \
26+
--tokens data/tokens.txt
27+
--max-duration 350
28+
```
29+
30+
To inference, use:
31+
```
32+
./vits/infer.py \
33+
--epoch 1000 \
34+
--exp-dir vits/exp \
35+
--tokens data/tokens.txt \
36+
--max-duration 500
37+
```

0 commit comments

Comments
 (0)