本项目仅为学习使用
This project is only for academic purposes
This is a fork of https://github.com/lutianxiong/vits_chinese
The original version of VITS : https://github.com/jaywalnut310/vits
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Espnet连接:github.com/espnet/espnet/tree/master/espnet2/gan_tts/vits
coqui-ai/TTS连接:github.com/coqui-ai/TTS/tree/main/recipes/ljspeech/vits_tts
apt-get install espeak
pip install -r requirements.txt
cd monotonic_align
python setup.py build_ext --inplace
删除2365号和2762号内容,不对中英文混杂进行训练
或修改2365号和2762内容为如下,此为baker标注错误,并且使用的英文编码无法识别,导致编码失败(本项目不使用这两条数据,会清洗掉)
002365 这图#2难不成#2是#1P过的#4? zhe4 tu2 nan2 bu4 cheng2 shi4 pi1 guo4 de5
002762 我是#2善良#1活泼#3、好奇心#1旺盛的#2B型血#4。 wo3 shi4 shan4 liang2 huo2 po1 hao4 qi2 xin1 wang4 sheng4 de5 bi4 xing2 xie3
python preprocess.py
使用的label为五级停顿、切分声韵母、无儿化音版
python train.py -c configs/baker_base.json -m baker_base
一张RTX3090 24G,训练40小时以上
修改为对应的模型,进行推理
python inference.py
RuntimeError: view_ as_ complex is only supported for float and double tensors, but got a tensor of scalar type: Half
音频处理时半精度出现的问题,解决方案在这个issue
RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argument
find_unused_parameters=True
totorch.nn.parallel.DistributedDataParallel
; (2) making sure allforward
function outputs participate in calculating loss. If you already have done the above two steps, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module'sforward
function. Please include the loss function and the structure of the return value offorward
of your module when reporting this issue (e.g. list, dict, iterable).
使用DistributedDataParallel函数出现的问题,可在DDP中添加find_unused_parameters=True
参数,但似乎并不是最优解
running build_ext
copying build/lib.linux-x86_64-3.8/monotonic_align/core.cpython-38-x86_64-linux-gnu.so -> monotonic_align
error: could not create 'monotonic_align/core.cpython-38-x86_64-linux-gnu.so': No such file or directory
在monotonic_align
文件夹下再创建一个monotonic_align
文件夹
-
停顿不正常。本来已经在音素后面强插边界了,VITS又在
add_blank
中强插边界,具体是配置参数:"add_blank": false
-
可能影响停顿的原因:随机时长预测,具体配置参数:
use_sdp=True
-
合成音频出现电音。可适当提高推理时的噪音参数,能显著提高合成质量
noise_scale=0.667, noise_scale_w=0.8