Skip to content

sowwnn/KFusion-Dual-Domain-for-Speech-to-Landmarks

Repository files navigation

KAN-based Fusion of Dual Domain for Audio-Driven Landmarks Generation


What is this:

KAN-based Fusion of Dual Domain for Audio-Driven Landmarks Generation of the model can help you generate an sequence of facial lanmarks from audio input.

Overview of ours model, the input is audio and one identity image, the output is a sequence of landmarks (red) compared with original landmarks (blue).

overview

How to use:

Training

  1. Clone ours repository

    git clone https://github.com/RC-Sho0/KFusion-Dual-Domain-for-Speech-to-Landmarks.git
  2. Create the environment

    python3 -m venv venv
    source activate venv/bin/activate
    pip3 install -r requirements.txt
  3. Prepair datasets:

    • Download MEAD dataset here or get this sample here
    mkdir dataset && cd dataset
    gdown --fuzzy https://drive.google.com/file/d/1UO2OmsBP6FqMfjDikJsXr-1QObJfzBwQ/view?usp=drive_link
    unzip M030.zip
    • (If used MEAD data you need to pre-processing by utils folder)
  4. Configure the experiment

    • Edit follow config/exp.json
    {
     "datalist": "dataset/duration/fa_datalist.json", //List sample of dataset
     "audio_path": "dataset/M030/audio",
     "landmark_path": "dataset/M030/landmark",
     "video_path": "dataset/M030/video",
     "duration": 1, // Length of the video and audio
     "max_epochs": 2, // Maximum number of epochs
     "batch": 4, // Batch size
    
     "init_lr": 3e-4, // Initial learning rate
     "val_epochs": 4, // Number of epochs to validate
     "save_weights": "weights", // Folder to save the weights
    
     "is_test": true // For development
    }
  5. Training

    • Run this command
    python3 main.py --config "config/exp.json" 

Inference

  • Download this weights from here
  • Configure the test file like example at config/infer.json
  • Run this command:
    python3 inference.py --config config/infer.json
    

Demo:

You can check in stats/infer_results/240625* it is compaired of the results with ground truth.

The First column is Identify Video it not used in the model, 2nd is Ground Truth, last one is the Prediction

Please star and follow if this repository helpful for you

Authorized by sowwn

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published