Skip to content

TCU-ClassifAI/model-comparisons

Repository files navigation

Model Comparisons

Compares the performance of STT (transcription) models on classrom audio data.

Data

https://tcu.app.box.com/file/1339927312124?s=gdr527kvqai17wtnhqc03kr8yq2iwc2h

Completely unprocessed.

Models

AWS Transcribe 1

Whisper 1

  • https://github.com/openai/whisper/
  • Used Large v2 model NO CUSTOMIZATION NO special vocabulary results in whisper_1.json
    • Took around 10 minutes to run on 1 hour of audio, on ml.cs.tcu.edu RESULTS: (better than aws transcribe 1)

Whisper 2

  • First used noise reduction on the audio
  • Then used Large v2 model NO CUSTOMIZATION NO special vocabulary results in whisper_2.json
    • Took around 10 minutes to run on 1 hour of audio, on ml.cs.tcu.edu RESULTS: (worse than whisper 1)

Files

Results

I used the WER metric to compare the results of the models. The lower the WER, the better the model.

Unfortunately I do not have a ground truth for this audio.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages