Skip to content

基于大模型的高质量情感虚拟人系统(Gradio+FUNASR+ChatGLM2-6B+GPT-SOVITS+EAT+GFPGAN)

Notifications You must be signed in to change notification settings

lililuya/Graduation-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

85 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

基于大模型的高质量情感虚拟人系统

  • 系统流程图 image

1. 一些测试结果

1.1 测试卡通人像

driven image original gfpgan
original_comic.mp4
gpfgan_comic.mp4

1.2 合成人物测试

driven image original gfpgan
talking_2024-05-14-16-34-25_liuyin.mp4
talking_restoration_2024-05-14-16-34-34_liuyin.mp4

1.3 不同表情测试

driven image happy scared neural
happy_sound.mp4
scared_sound.mp4
neural_sound.mp4

1.4 不同的声音测试

liwen fufu liuying
me_no_back.mp4
fufu.mp4
talking_restoration_2024-05-14-16-34-34_liuyin.mp4

1.5 不同语言测试

Chinese English
talking_2024-05-03-05-14-59.mp4
talking_2024-05-06-22-12-02.mp4

1.6 不同的动作测试

pose1 pose2 pose3
pose1.mp4
pose2.mp4
pose3.mp4

2. 环境准备

2.1 准备EAT环境和GPT-SOVITS环境

  • 系统环境
    • Python 3.9.19
    • Ubuntu 20.04.1
    • Graphics-Card 2-4090
git clone https://github.com/lililuya/Graduation-Project.git
cd env
  • Use conda or pip
# if use conda, modify the prefix of environment.yml or delete it to use the default location
conda env create -f environment.yml
# if use pip, delete some local package index.
pip install -r requirements.txt

2.2 ModelScope和GPT-SOVISTS的环境问题,以ModelScope的为准

pip install funasr==1.0.22
pip install modelscope==1.13.3

2.3 tensorrt安装

参考tensorrt安装笔记

2.4 一些问题

  • 主要可能出现的问题是numba版本的问题,出现后更新numba版本即可
pip install -U numba

3.权重文件

4.运行

4.1 本地运行

python whole_pipeline_GPTSOVITS_asr_en_gradio_multivoice.py

4.2 使用Gradio自带内网穿透

# Modify launch=True

4.3界面

  • 情感虚拟人生成模块 test_record
  • 中英文TTS test_TTS_en
  • 中英文ASR page4
  • 抠图 page3

引用文献

@InProceedings{Gan_2023_ICCV,
    author    = {Gan, Yuan and Yang, Zongxin and Yue, Xihang and Sun, Lingyun and Yang, Yi},
    title     = {Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {22634-22645}
}

@InProceedings{wang2021gfpgan,
    author = {Xintao Wang and Yu Li and Honglun Zhang and Ying Shan},
    title = {Towards Real-World Blind Face Restoration with Generative Facial Prior},
    booktitle={The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    year = {2021}
}

@inproceedings{gao22b_interspeech,
  author={Zhifu Gao and ShiLiang Zhang and Ian McLoughlin and Zhijie Yan},
  title={Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition},
  year=2022,
  booktitle={Proc. Interspeech 2022},
  pages={2063--2067},
  doi={10.21437/Interspeech.2022-9996}
}

@inproceedings{du2022glm,
  title={GLM: General Language Model Pretraining with Autoregressive Blank Infilling},
  author={Du, Zhengxiao and Qian, Yujie and Liu, Xiao and Ding, Ming and Qiu, Jiezhong and Yang, Zhilin and Tang, Jie},
  booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  pages={320--335},
  year={2022}
}

相关仓库

目前存在的问题

  • 中文同步问题,参考issue
  • 显存需要6G+10G才可以跑起来,现存占用过大。
  • 目前展示的结果效果不太好,因为选择的初始图片不太清晰,并且onnx下损失了超分模型的部分精度。
  • 头拼合进身体,EAT作者建议
  • 背景抖动,EAT作者建议,本仓库采取MODNet方案。
  • Deepspeech加速,目前提取音频特征需要时间特别久,使用的deepspeech-0.1版本。
  • GPT-SOVITS模型自定义载,资源换时间,每个模型大约1.8G左右,可以写入配置文件自定义加载。

声明

本项目以EAT为核心模型,主要做一个实验探究,不存在任何其他用途。

About

基于大模型的高质量情感虚拟人系统(Gradio+FUNASR+ChatGLM2-6B+GPT-SOVITS+EAT+GFPGAN)

Resources

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published