VOCALS

VOCALS is an innovative application that allows users to record their voice or upload audio files in WAV format, which are then sent to the backend for analysis. The primary goal is to classify the audio as either "stuttering" or "non-stuttering" using a custom-trained model.

demonstration

2025-02-18.21-47-27.mp4

Project Overview

Frontend:
Built with Next.js, the frontend offers an intuitive interface for users to either record their voice directly or upload pre-recorded audio files.
Backend:
Developed using FastAPI in Python, the backend hosts a modified version of the openai-whisper-tiny ASR model. This model has been adapted into a classification model and retrained using the sep-28k dataset from Hugging Face with PyTorch. During testing, the model achieved a 92% accuracy in distinguishing between stuttering and non-stuttering audio.
User Feedback:
The application integrates the Gemini API to provide enhanced feedback to users. Adjustments such as tweaking the temperature settings have been implemented to optimize the quality of the feedback provided.

Technologies Used

Frontend: Next.js
Backend: FastAPI (Python)
Model Training: PyTorch

Model Training Code

For further details about the implementation, please refer to the code. @Rishabh-Gi-t.

License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Readme.md

Readme.md

VOCALS

demonstration

Project Overview

Technologies Used

Model Training Code

Files

Readme.md

Latest commit

History

Readme.md

File metadata and controls

VOCALS

demonstration

Project Overview

Technologies Used

Model Training Code