This repository contains the code for our project on audio steganography. Most of our work is based on this paper. This work was done as part of CS753: Automatic Speech Recognition at IIT Bombay.
A major chunk of the repo has been forked from here.
Contributors: Samyak Shah, Rishabh Dahale and Mithilesh Vaidya
We recommend creating a virtual environment and installing the python requirements there.
virtualenv <path_to_your_env>
source <path_to_your_env>/bin/activate
pip install -r requirements.txt
ctc_best: Trained ASR models
examples: a few examples which are mentioned in the presentation. Each recording has a folder which contains:
- name.wav: original clean recording e.g. walter.wav
- name_encoded_text.wav: perturbed recording for the encoded text
- name_encoded_text.pkl: pickle file containing loss and PESQ score as a function of the number of iterations
speech: main codebase which contains the ASR model and preprocessing steps
final_presentation.pptx: a brief presentation of our project
stego.py: contains the actual stego algorithm
train.py: file for training the ASR model
stego.py contains the both algorithms for calculating the perturbation in time-domain and spectral-domain
It takes as input a path to the audio recording and a list of phones to encode.