- Original version:
pip install tensorflow==1.9 keras==2.1.5 - Updated 2020.04.20:
pip install tensorflow==1.15.2 keras - or
pipenv installis you havepipenv
- Download ffmpeg from ffmpeg, you should select
Staticlinking and get a zip file. - extract the zip file into
ffmpegfolder, so that there existsffmpeg/bin/ffmeg.exe.
- Download sox from SOund eXchange, you should get a zip file.
- extract zip file into the sox folder. so that there exists
sox/sox.exe.
Convert recorded audio files to *.wav files
$ python ./convert_file.py <Data Folder>
The Data Folder should contains many subfolders where your audios files reside. Typically, one of your audio file could be <Data Folder>/group1/a.mp3.
The results of conversion are within ./data/train/. Your should manually move some of them to ./data/test to accomplish training-validation separation.
The fraction of moved files depends on yourself.
The data augmentation server is implemented by grpc.
$ pip install grpcio
or for some version of python3
$ pip3 install grpcio
Training involves two files: train.py and augmentation/.
$ python -m augmentation will start a augmentation server that provide train data and test data.
train.py will connect to augmentation server and request data.
augmentation/config.py is used for configuring the batch size/thread size/data source/...
Before training, there are several things you should do.
You have done it in Data preparation. Now check it again.
-
put train data into
data/train/ -
put validate data into
data/test/ -
NOTE: the wav file must be encoded by 16 bit signed integer, mono-channeled and at a sampling rate of 16000.
-
You should got things correct if you obtained them from
convert_file.py
- You should got sox in
sox/, now check it again.
server side: $python -m augmentation
- this will start an augmentation server utilizing
sox.
client side: $python train.py
-
this will start trainig with data requested from augmentation server.
-
NOTE: run it from the folder
audioNet
** Resume a interrupted training process.
You can resume from certain checkpoint, modify the last line of train.py, set -1(Negtive 1) as your start point.
modify webfront.py, change MODEL_ID to yours.
open a web browser and input URL:http://127.0.0.1:5000/predict.
*It requires [ffmpeg](https://ffmpeg.org/) for audio file format convertion.
** Select Checkpoint for Evaluation
modify webfront.py, change MODEL_ID to yours.
See Run python webfront.py
- Choose an
IDof checkpoint by yourself frommodels/save_<ID>.h5. - Run
$ python ./create_pb.py <ID>. This will create filemodels/model.pb - Place your model.pb file where you want to deploy. Typically, see Android mobile example: androidAudioRecg