基于大模型的高质量情感虚拟人系统

系统流程图

1. 一些测试结果

1.1 测试卡通人像

driven image	original	gfpgan
	original_comic.mp4	gpfgan_comic.mp4

1.2 合成人物测试

driven image	original	gfpgan
	talking_2024-05-14-16-34-25_liuyin.mp4	talking_restoration_2024-05-14-16-34-34_liuyin.mp4

1.3 不同表情测试

driven image	happy	scared	neural
	happy_sound.mp4	scared_sound.mp4	neural_sound.mp4

1.4 不同的声音测试

liwen	fufu	liuying
me_no_back.mp4	fufu.mp4	talking_restoration_2024-05-14-16-34-34_liuyin.mp4

1.5 不同语言测试

Chinese	English
talking_2024-05-03-05-14-59.mp4	talking_2024-05-06-22-12-02.mp4

1.6 不同的动作测试

pose1	pose2	pose3
pose1.mp4	pose2.mp4	pose3.mp4

2. 环境准备

2.1 准备EAT环境和GPT-SOVITS环境

系统环境
- Python 3.9.19
- Ubuntu 20.04.1
- Graphics-Card 2-4090

git clone https://github.com/lililuya/Graduation-Project.git
cd env

Use conda or pip

# if use conda, modify the prefix of environment.yml or delete it to use the default location
conda env create -f environment.yml

# if use pip, delete some local package index.
pip install -r requirements.txt

2.2 ModelScope和GPT-SOVISTS的环境问题，以ModelScope的为准

pip install funasr==1.0.22
pip install modelscope==1.13.3

2.3 tensorrt安装

参考tensorrt安装笔记

2.4 一些问题

主要可能出现的问题是numba版本的问题，出现后更新numba版本即可

pip install -U numba

3.权重文件

EAT权重文件
- 下载后放在根目录下的ckpt下
GFPGAN
- 下载后放到根目录的restoration下面
GPTSOVIT权重
- 下载后放到根目录下的GPT_SoVits/weights下面
MODNET权重
- 下载后放在pretrain下面
DeepSpeech
- 参考RADNERF

4.运行

4.1 本地运行

python whole_pipeline_GPTSOVITS_asr_en_gradio_multivoice.py

4.2 使用Gradio自带内网穿透

# Modify launch=True

一些配置参考Gradio Network Traversal

4.3界面

情感虚拟人生成模块
中英文TTS
中英文ASR
抠图

引用文献

@InProceedings{Gan_2023_ICCV,
    author    = {Gan, Yuan and Yang, Zongxin and Yue, Xihang and Sun, Lingyun and Yang, Yi},
    title     = {Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {22634-22645}
}

@InProceedings{wang2021gfpgan,
    author = {Xintao Wang and Yu Li and Honglun Zhang and Ying Shan},
    title = {Towards Real-World Blind Face Restoration with Generative Facial Prior},
    booktitle={The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    year = {2021}
}

@inproceedings{gao22b_interspeech,
  author={Zhifu Gao and ShiLiang Zhang and Ian McLoughlin and Zhijie Yan},
  title={Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition},
  year=2022,
  booktitle={Proc. Interspeech 2022},
  pages={2063--2067},
  doi={10.21437/Interspeech.2022-9996}
}

@inproceedings{du2022glm,
  title={GLM: General Language Model Pretraining with Autoregressive Blank Infilling},
  author={Du, Zhengxiao and Qian, Yujie and Liu, Xiao and Ding, Ming and Qiu, Jiezhong and Yang, Zhilin and Tang, Jie},
  booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  pages={320--335},
  year={2022}
}

目前存在的问题

中文同步问题，参考issue
显存需要6G+10G才可以跑起来，现存占用过大。
目前展示的结果效果不太好，因为选择的初始图片不太清晰，并且onnx下损失了超分模型的部分精度。
头拼合进身体，EAT作者建议。
背景抖动，EAT作者建议，本仓库采取MODNet方案。
Deepspeech加速，目前提取音频特征需要时间特别久，使用的deepspeech-0.1版本。
GPT-SOVITS模型自定义载，资源换时间，每个模型大约1.8G左右，可以写入配置文件自定义加载。

声明

本项目以EAT为核心模型，主要做一个实验探究，不存在任何其他用途。

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
GPT_SoVits		GPT_SoVits
MODNet		MODNet
Utils		Utils
audio		audio
audio_file		audio_file
audio_temp		audio_temp
ckpt		ckpt
config		config
demo/test_gpfgan		demo/test_gpfgan
env		env
face_detection		face_detection
frp		frp
img		img
logo		logo
modules		modules
preprocess		preprocess
restoration		restoration
sh_tools		sh_tools
style		style
sync_batchnorm		sync_batchnorm
tensorrt/official_sample_for_Res50		tensorrt/official_sample_for_Res50
text_exp_graduate_project		text_exp_graduate_project
tools		tools
EAT_model.py		EAT_model.py
README.md		README.md
SECURITY.md		SECURITY.md
animate.py		animate.py
app_communicate.py		app_communicate.py
augmentation.py		augmentation.py
debug记录.md		debug记录.md
deepspeech_features.py		deepspeech_features.py
deepspeech_store.py		deepspeech_store.py
demo.py		demo.py
demo_o.py		demo_o.py
emotional_tts.py		emotional_tts.py
extract_ds_features.py		extract_ds_features.py
frames_dataset_transformer25.py		frames_dataset_transformer25.py
generate_txt_byglm.py		generate_txt_byglm.py
gradio_test_asr_multivoice.py		gradio_test_asr_multivoice.py
image_matting.py		image_matting.py
logger.py		logger.py
main.py		main.py
pad.npy		pad.npy
pretrain_a2kp.py		pretrain_a2kp.py
pretrain_a2kp_img.py		pretrain_a2kp_img.py
prompt_st_dp_eam3d.py		prompt_st_dp_eam3d.py
rand_sample_ours_mead100.npy		rand_sample_ours_mead100.npy
seafoam_theme.py		seafoam_theme.py
test_asr_to_tts_pipeline.py		test_asr_to_tts_pipeline.py
test_cuda.py		test_cuda.py
test_emotional_analysis.py		test_emotional_analysis.py
test_english_asr.py		test_english_asr.py
test_extract_deepspeech.py		test_extract_deepspeech.py
test_fastapi.py		test_fastapi.py
test_gfpgan.py		test_gfpgan.py
test_gradio_theme.py		test_gradio_theme.py
test_lrw_posedeep_normalize_neutral.py		test_lrw_posedeep_normalize_neutral.py
test_mead.py		test_mead.py
test_npy.py		test_npy.py
test_pose.py		test_pose.py
test_posedeep_deepprompt_eam3d.py		test_posedeep_deepprompt_eam3d.py
test_read_png.py		test_read_png.py
test_tcp_connect.py		test_tcp_connect.py
test_tts.py		test_tts.py
train_transformer.py		train_transformer.py
whole_pipeline.py		whole_pipeline.py
whole_pipeline_GPTSOVITS.py		whole_pipeline_GPTSOVITS.py
whole_pipeline_GPTSOVITS_asr_en_gradio copy.py		whole_pipeline_GPTSOVITS_asr_en_gradio copy.py
whole_pipeline_GPTSOVITS_asr_en_gradio.py		whole_pipeline_GPTSOVITS_asr_en_gradio.py
whole_pipeline_GPTSOVITS_asr_en_gradio_multivoice.py		whole_pipeline_GPTSOVITS_asr_en_gradio_multivoice.py
whole_pipeline_GPTSOVITS_gradio.py		whole_pipeline_GPTSOVITS_gradio.py
whole_pipeline_socket.py		whole_pipeline_socket.py
yaml_config.py		yaml_config.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

基于大模型的高质量情感虚拟人系统

1. 一些测试结果

1.1 测试卡通人像

1.2 合成人物测试

1.3 不同表情测试

1.4 不同的声音测试

1.5 不同语言测试

1.6 不同的动作测试

2. 环境准备

2.1 准备EAT环境和GPT-SOVITS环境

2.2 ModelScope和GPT-SOVISTS的环境问题，以ModelScope的为准

2.3 tensorrt安装

2.4 一些问题

3.权重文件

4.运行

4.1 本地运行

4.2 使用Gradio自带内网穿透

4.3界面

引用文献

相关仓库

目前存在的问题

声明

About

Releases

Packages

Languages

lililuya/Graduation-Project

Folders and files

Latest commit

History

Repository files navigation

基于大模型的高质量情感虚拟人系统

1. 一些测试结果

1.1 测试卡通人像

1.2 合成人物测试

1.3 不同表情测试

1.4 不同的声音测试

1.5 不同语言测试

1.6 不同的动作测试

2. 环境准备

2.1 准备EAT环境和GPT-SOVITS环境

2.2 ModelScope和GPT-SOVISTS的环境问题，以ModelScope的为准

2.3 tensorrt安装

2.4 一些问题

3.权重文件

4.运行

4.1 本地运行

4.2 使用Gradio自带内网穿透

4.3界面

引用文献

相关仓库

目前存在的问题

声明

About

Resources

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages