This repository contains code for synthesizing audio from gesture data, using user-defined paired gesture/audio data.
- Clone the repository:
git clone https://github.com/mhrice/gesture-to-audio.git cd gesture-to-audio - Create and activate a virtual environment:
python3 -m venv env source env/bin/activate - Install the required packages:
pip install -e .
If you have issues with accidentally installing GPU drivers, restart and try with torch explicitly
deactivate
rm -rf env
python3 -m venv env
source env/bin/activate
pip install torch torchaudio torchvision
pip install -e .
For training logs, you'll need a free Weights & Biases account account. Set up your API key:
wandb login-
Prepare your paired gesture/audio dataset. To capture with the microphone/camera, run
python scripts/record_dataset.pyscript. This will save the dataset in a folder namedrecorded_dataset/by default.python scripts/record_dataset.py --duration 200 (in milliseconds)
-
Preprocess the dataset:
python scripts/process_dataset.py recorded_dataset
-
Train the model:
python scripts/train.py recorded_dataset --duration 200
This will save model checkpoints in the
gesture-to-audio/directory. -
Synthesize audio from new gesture data:
python scripts/non_realtime_test.py /path/to/your/checkpoint.ckpt --duration 200
Existing checkpoints for my paired audio/gesture data can be found here.
More project details here