KAN-based Fusion of Dual Domain for Audio-Driven Landmarks Generation

What is this:

KAN-based Fusion of Dual Domain for Audio-Driven Landmarks Generation of the model can help you generate an sequence of facial lanmarks from audio input.

Overview of ours model, the input is audio and one identity image, the output is a sequence of landmarks (red) compared with original landmarks (blue).

About technique paper let check it out: arxiv

How to use:

Training

Clone ours repository

git clone https://github.com/RC-Sho0/KFusion-Dual-Domain-for-Speech-to-Landmarks.git

Create the environment

python3 -m venv venv
source activate venv/bin/activate
pip3 install -r requirements.txt

Prepair datasets:

Download MEAD dataset here or get this sample here

mkdir dataset && cd dataset
gdown --fuzzy https://drive.google.com/file/d/1UO2OmsBP6FqMfjDikJsXr-1QObJfzBwQ/view?usp=drive_link
unzip M030.zip

(If used MEAD data you need to pre-processing by utils folder)

Configure the experiment

Edit follow config/exp.json

{
 "datalist": "dataset/duration/fa_datalist.json", //List sample of dataset
 "audio_path": "dataset/M030/audio",
 "landmark_path": "dataset/M030/landmark",
 "video_path": "dataset/M030/video",
 "duration": 1, // Length of the video and audio
 "max_epochs": 2, // Maximum number of epochs
 "batch": 4, // Batch size

 "init_lr": 3e-4, // Initial learning rate
 "val_epochs": 4, // Number of epochs to validate
 "save_weights": "weights", // Folder to save the weights

 "is_test": true // For development
}

Training

Run this command

python3 main.py --config "config/exp.json"

Inference

Download this weights from here
Configure the test file like example at config/infer.json

Run this command:

python3 inference.py --config config/infer.json

Demo:

You can check in stats/infer_results/240625* it is compaired of the results with ground truth.

The First column is Identify Video it not used in the model, 2nd is Ground Truth, last one is the Prediction

Citation

@misc{vothanh2024kanbasedfusiondualdomainaudiodriven,
      title={KAN-Based Fusion of Dual-Domain for Audio-Driven Facial Landmarks Generation}, 
      author={Hoang-Son Vo-Thanh and Quang-Vinh Nguyen and Soo-Hyung Kim},
      year={2024},
      eprint={2409.05330},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2409.05330}, 
}

Please star and follow if this repository helpful for you

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
backup		backup
cmd		cmd
config		config
dataloader		dataloader
metrix		metrix
module		module
static		static
stats		stats
utils		utils
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
inference.py		inference.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KAN-based Fusion of Dual Domain for Audio-Driven Landmarks Generation

What is this:

How to use:

Training

Inference

Demo:

Citation

Authorized by sowwn

About

Releases

Packages

Languages

sowwnn/KFusion-Dual-Domain-for-Speech-to-Landmarks

Folders and files

Latest commit

History

Repository files navigation

KAN-based Fusion of Dual Domain for Audio-Driven Landmarks Generation

What is this:

How to use:

Training

Inference

Demo:

Citation

Authorized by sowwn

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages