driven image | original | gfpgan |
---|---|---|
original_comic.mp4 |
gpfgan_comic.mp4 |
driven image | original | gfpgan |
---|---|---|
talking_2024-05-14-16-34-25_liuyin.mp4 |
talking_restoration_2024-05-14-16-34-34_liuyin.mp4 |
driven image | happy | scared | neural |
---|---|---|---|
happy_sound.mp4 |
scared_sound.mp4 |
neural_sound.mp4 |
liwen | fufu | liuying |
---|---|---|
me_no_back.mp4 |
fufu.mp4 |
talking_restoration_2024-05-14-16-34-34_liuyin.mp4 |
Chinese | English |
---|---|
talking_2024-05-03-05-14-59.mp4 |
talking_2024-05-06-22-12-02.mp4 |
pose1 | pose2 | pose3 |
---|---|---|
pose1.mp4 |
pose2.mp4 |
pose3.mp4 |
- 系统环境
- Python 3.9.19
- Ubuntu 20.04.1
- Graphics-Card 2-4090
git clone https://github.com/lililuya/Graduation-Project.git
cd env
- Use conda or pip
# if use conda, modify the prefix of environment.yml or delete it to use the default location
conda env create -f environment.yml
# if use pip, delete some local package index.
pip install -r requirements.txt
pip install funasr==1.0.22
pip install modelscope==1.13.3
- 主要可能出现的问题是numba版本的问题,出现后更新numba版本即可
pip install -U numba
- EAT权重文件
- 下载后放在根目录下的
ckpt
下
- 下载后放在根目录下的
- GFPGAN
- 下载后放到根目录的
restoration
下面
- 下载后放到根目录的
- GPTSOVIT权重
- 下载后放到根目录下的
GPT_SoVits/weights
下面
- 下载后放到根目录下的
- MODNET权重
- 下载后放在
pretrain
下面
- 下载后放在
- DeepSpeech
- 参考RADNERF
python whole_pipeline_GPTSOVITS_asr_en_gradio_multivoice.py
# Modify launch=True
- 一些配置参考Gradio Network Traversal
@InProceedings{Gan_2023_ICCV,
author = {Gan, Yuan and Yang, Zongxin and Yue, Xihang and Sun, Lingyun and Yang, Yi},
title = {Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2023},
pages = {22634-22645}
}
@InProceedings{wang2021gfpgan,
author = {Xintao Wang and Yu Li and Honglun Zhang and Ying Shan},
title = {Towards Real-World Blind Face Restoration with Generative Facial Prior},
booktitle={The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2021}
}
@inproceedings{gao22b_interspeech,
author={Zhifu Gao and ShiLiang Zhang and Ian McLoughlin and Zhijie Yan},
title={Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition},
year=2022,
booktitle={Proc. Interspeech 2022},
pages={2063--2067},
doi={10.21437/Interspeech.2022-9996}
}
@inproceedings{du2022glm,
title={GLM: General Language Model Pretraining with Autoregressive Blank Infilling},
author={Du, Zhengxiao and Qian, Yujie and Liu, Xiao and Ding, Ming and Qiu, Jiezhong and Yang, Zhilin and Tang, Jie},
booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
pages={320--335},
year={2022}
}
- 中文同步问题,参考issue
- 显存需要6G+10G才可以跑起来,现存占用过大。
- 目前展示的结果效果不太好,因为选择的初始图片不太清晰,并且onnx下损失了超分模型的部分精度。
- 头拼合进身体,EAT作者建议。
- 背景抖动,EAT作者建议,本仓库采取MODNet方案。
- Deepspeech加速,目前提取音频特征需要时间特别久,使用的deepspeech-0.1版本。
- GPT-SOVITS模型自定义载,资源换时间,每个模型大约1.8G左右,可以写入配置文件自定义加载。
本项目以EAT为核心模型,主要做一个实验探究,不存在任何其他用途。