VOCALS

VOCALS is an innovative application that allows users to record their voice or upload audio files in WAV format, which are then sent to the backend for analysis. The primary goal is to classify the audio as either "stuttering" or "non-stuttering" using a custom-trained model.

demonstration

2025-02-18.21-47-27.mp4

Project Overview

Frontend:
Built with Next.js, the frontend offers an intuitive interface for users to either record their voice directly or upload pre-recorded audio files.
Backend:
Developed using FastAPI in Python, the backend hosts a modified version of the openai-whisper-tiny ASR model. This model has been adapted into a classification model and retrained using the sep-28k dataset from Hugging Face with PyTorch. During testing, the model achieved a 92% accuracy in distinguishing between stuttering and non-stuttering audio.
User Feedback:
The application integrates the Gemini API to provide enhanced feedback to users. Adjustments such as tweaking the temperature settings have been implemented to optimize the quality of the feedback provided.

Technologies Used

Frontend: Next.js
Backend: FastAPI (Python)
Model Training: PyTorch

Model Training Code

For further details about the implementation, please refer to the code. @Rishabh-Gi-t.

License

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Backend		Backend
Frontend		Frontend
LICENSE		LICENSE
Readme.md		Readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VOCALS

demonstration

Project Overview

Technologies Used

Model Training Code

About

Releases

Packages

Contributors 2

Languages

License

theakash04/stuttering-detection-API

Folders and files

Latest commit

History

Repository files navigation

VOCALS

demonstration

Project Overview

Technologies Used

Model Training Code

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages