Skip to content

mhrice/gesture-to-audio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gesture-to-Audio Synthesis

This repository contains code for synthesizing audio from gesture data, using user-defined paired gesture/audio data.

Installation

  1. Clone the repository:
    git clone https://github.com/mhrice/gesture-to-audio.git
    cd gesture-to-audio
  2. Create and activate a virtual environment:
    python3 -m venv env
    source env/bin/activate
  3. Install the required packages:
    pip install -e .

If you have issues with accidentally installing GPU drivers, restart and try with torch explicitly

deactivate
rm -rf env
python3 -m venv env
source env/bin/activate
pip install torch torchaudio torchvision
pip install -e .

For training logs, you'll need a free Weights & Biases account account. Set up your API key:

wandb login

Usage

  1. Prepare your paired gesture/audio dataset. To capture with the microphone/camera, run python scripts/record_dataset.py script. This will save the dataset in a folder named recorded_dataset/ by default.

    python scripts/record_dataset.py --duration 200 (in milliseconds)
  2. Preprocess the dataset:

    python scripts/process_dataset.py recorded_dataset
  3. Train the model:

    python scripts/train.py recorded_dataset --duration 200

    This will save model checkpoints in the gesture-to-audio/ directory.

  4. Synthesize audio from new gesture data:

    python scripts/non_realtime_test.py /path/to/your/checkpoint.ckpt --duration 200

Existing checkpoints for my paired audio/gesture data can be found here.

Diagrams

Training

image

Inference

image

More project details here

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages