Skip to content

VOCALS detects stuttering in user-recorded or uploaded WAV audio using a custom-trained model for real-time analysis.

License

Notifications You must be signed in to change notification settings

theakash04/stuttering-detection-API

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 

Repository files navigation

VOCALS

VOCALS is an innovative application that allows users to record their voice or upload audio files in WAV format, which are then sent to the backend for analysis. The primary goal is to classify the audio as either "stuttering" or "non-stuttering" using a custom-trained model.

demonstration

2025-02-18.21-47-27.mp4

Project Overview

  • Frontend:
    Built with Next.js, the frontend offers an intuitive interface for users to either record their voice directly or upload pre-recorded audio files.

  • Backend:
    Developed using FastAPI in Python, the backend hosts a modified version of the openai-whisper-tiny ASR model. This model has been adapted into a classification model and retrained using the sep-28k dataset from Hugging Face with PyTorch. During testing, the model achieved a 92% accuracy in distinguishing between stuttering and non-stuttering audio.

  • User Feedback:
    The application integrates the Gemini API to provide enhanced feedback to users. Adjustments such as tweaking the temperature settings have been implemented to optimize the quality of the feedback provided.

Technologies Used

  • Frontend: Next.js
  • Backend: FastAPI (Python)
  • Model Training: PyTorch

Model Training Code

For further details about the implementation, please refer to the code. @Rishabh-Gi-t.

License

About

VOCALS detects stuttering in user-recorded or uploaded WAV audio using a custom-trained model for real-time analysis.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published