Skip to content

Listen to any audio stream on your machine and print out the transcribed or translated audio.

License

Notifications You must be signed in to change notification settings

tomjpalamattam/audioWhisper

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

audioWhisper - with fast_whisper

Listen to any audio stream on your machine and print out the transcribed or translated audio. Based on OpenAI's Whisper project. If you want to live transcribe or translate audio from a livestream or URL, you can find it here

Prerequisites

  1. Turn on stereo mix settings on windows first before running the script
  2. Install and add ffmpeg to your PATH
  3. Install CUDA to your system (package 'cudnn' in arch linux)

Setup

  1. choose envs of your choices.
  2. clone this repo into your local storage.
  3. run pip install -r requirements.txt
  4. run python audioWhisper.py --devices true to get device_index and channel
  5. run python audioWhisper.py . Make sure to define the index of Stereo Mix output device if it is not 2.

Command-line flags

--flags Default Value Description
--devices false To print all available devices
--model small Select model list. refer here
--task transcribe Choose between to transcribe or to translate the audio to English
--device_index 2 Choose the output device to listen to and transcribe the audio from this device
--channel 2 Number of channels of the output device
--rate 44100 Sampling rate of the output device
--audioseconds 5 Length of audio files to record (seconds)
--audiocounts 5 Number of audio files to save into path
--output_dir "audio" Output directory to save audio files recorded by audioWhisper.py

Bugs and Fixes

The performance of the transcribing and translating the audio are depending on your machine's performance and model you used. medium or large models could give more accurate and make sense translation while tiny and small is good enough for transcribing the english audio.

  1. Make sure the playback device of your machine is the same with Stereo Mix device before you run the script.

Performance Test on Ryzen 5 5600G with NVIDIA RTX3060

The translated audio is not perfect but it can still translate the point of the talk from audio. Video demo for this app is on youtube.

License

The code and the model weights of Whisper are released under the MIT License. See their repo for more information. The code of this repo is under MIT License. See LICENSE for further details.

About

Listen to any audio stream on your machine and print out the transcribed or translated audio.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%