Feature: Add Audio Input for Generating Q&A #82

IronJam11 · 2024-12-19T12:26:26Z

Resolves Issue #34

What does this PR do?
Resolves audio transcription functionality issues by implementing a robust Flask-based transcription service that handles multiple audio/video formats.

Changes Made:

Implemented unified audio conversion pipeline using FFmpeg
Added support for multiple formats including MP3, WAV, OGG, M4A, MP4, AVI, MOV, MKV, WEBM, AAC
Enhanced error handling and logging throughout the application
Added file validation and security measures (secure filenames, size limits)
Implemented automatic cleanup of temporary files
Integrated Google Speech Recognition with optimized settings
Added proper CORS support for cross-origin requests

Technical Details:

Uses FFmpeg for audio/video processing
Leverages Google Speech Recognition API for transcription
Standardizes audio conversion to 16kHz mono WAV format
Implements file size limit of 16MB
Uses Werkzeug security features for filename handling
Includes comprehensive logging system

How to Test:

Install required dependencies:
bashCopypip install flask flask-cors SpeechRecognition ffmpeg-python werkzeug
Install FFmpeg on your system
Run the Flask application
Send a POST request to /upload with any supported audio/video file
Verify that you receive a JSON response with the transcription

Attached Video:
https://drive.google.com/file/d/1K0r7-J2cgbfU4AhBG87kxVZl3BzUGdUl/view?usp=sharing

Related Isuue:
#34

Roaster05 · 2024-12-20T09:26:27Z

@IronJam11 could you provide some more details about the implementation? , also please improve the PR message

IronJam11 · 2024-12-23T19:06:31Z

Resolves Issue #34: Audio Transcription Functionality

What does this PR do?

This PR addresses and resolves issues with the audio transcription functionality by implementing a robust Flask-based transcription service. It supports multiple audio/video formats, enhances security, and optimizes processing.

Key Changes Made

Audio Conversion Pipeline:
- Unified pipeline using FFmpeg for seamless audio/video conversion.
- Standardized conversion to 16kHz mono WAV format for optimal transcription accuracy.
Format Support:
- Added support for MP3, WAV, OGG, M4A, MP4, AVI, MOV, MKV, WEBM, AAC formats.
Error Handling & Logging:
- Enhanced error handling for robust failure management.
- Integrated comprehensive logging to track application behavior.
Security Enhancements:
- Implemented file validation, secure filenames (via Werkzeug), and file size limits (16MB).
- Automatic cleanup of temporary files to prevent resource leaks.
Transcription Integration:
- Leveraged Google Speech Recognition API with optimized settings for improved accuracy.

Technical Details

Audio/Video Processing: Utilizes FFmpeg for format conversion and preprocessing.
Transcription: Integrates Google Speech Recognition for accurate and efficient transcription.
File Handling:
- Enforces file size limit (16MB).
- Uses secure filename handling via Werkzeug.
- Automatically cleans up temporary files post-processing.

How to Test

Install required dependencies:

pip install flask flask-cors SpeechRecognition ffmpeg-python werkzeug

Install FFmpeg on your system.
Run the Flask application.
Send a POST request to /upload with any supported audio/video file.
Verify the JSON response containing the transcription.

Attached Video

Demonstration of functionality: [Link to Video](https://drive.google.com/file/d/1K0r7-J2cgbfU4AhBG87kxVZl3BzUGdUl/view?usp=sharing)

Let me know if you would like me to explain more on this @Roaster05

Roaster05 · 2024-12-30T13:17:13Z

@IronJam11 i have 2 concerns try to find a work around for them

could you find some alternative for installing the FFmpeg in the local system to make this functionality work
The size limit of 16 Mb is way too less for videos so can we increase that

IronJam11 · 2024-12-30T13:28:58Z

i will make the changes and add the necessary details until midnight

IronJam11 · 2024-12-30T17:23:02Z

I have increased the capacity to 50 Mb, and in order to install FFmpeg you will need to add to your path variables.
Here’s a step-by-step guide to download and install FFmpeg on Windows, Linux, and macOS.

1. Windows

Manual Installation

Download FFmpeg:
- Go to [FFmpeg official site](https://ffmpeg.org/download.html).
- Under "Get packages & executable files", select Windows.
- Alternatively, download the latest FFmpeg build from [gyan.dev](https://www.gyan.dev/ffmpeg/builds/).
Extract Files:
- Download the ZIP file and extract it to a directory, e.g., C:\ffmpeg.
Add FFmpeg to PATH:
- Open the Start Menu, search for Environment Variables, and click on it.
- Under System Variables, find Path, and click Edit.
- Add the path to the bin directory, e.g., C:\ffmpeg\bin.
- Click OK to save.
Verify Installation:
- Open Command Prompt and run:
```
ffmpeg -version
```

Using Package Managers

Chocolatey:
```
choco install ffmpeg
```
Scoop:
```
scoop install ffmpeg
```

2. Linux

Debian/Ubuntu

Update the package list:
```
sudo apt update
```
Install FFmpeg:
```
sudo apt install ffmpeg
```
Verify Installation:
```
ffmpeg -version
```

Fedora

Install FFmpeg:
```
sudo dnf install ffmpeg
```
Verify Installation:
```
ffmpeg -version
```

Arch Linux

Install FFmpeg:
```
sudo pacman -S ffmpeg
```
Verify Installation:
```
ffmpeg -version
```

Snap (Universal for Linux):

Install FFmpeg:
```
sudo snap install ffmpeg
```
Verify Installation:
```
ffmpeg -version
```

3. macOS

Using Homebrew (Recommended)

Install Homebrew (if not already installed):

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Install FFmpeg:
```
brew install ffmpeg
```
Verify Installation:
```
ffmpeg -version
```

Using MacPorts

Install MacPorts:
```
sudo port selfupdate
```
Install FFmpeg:
```
sudo port install ffmpeg
```
Verify Installation:
```
ffmpeg -version
```

@Roaster05

IronJam11 · 2025-01-02T07:55:13Z

@Roaster05 ???

Roaster05 · 2025-01-02T07:59:36Z

@Roaster05
@IronJam11 i meant you to find an alternative for FFmpeg rather than giving an alternative method to install it, we need some way to work it on server side itself, also pls dont tag multiple times i usually reply in few days

Aaryan Jain added 2 commits December 19, 2024 17:31

add the audio feature

e0aa301

make minor changes

6a0005a

make improvements in the code structure

0691e55

Roaster05 added the high-priority label Dec 30, 2024

increase the storage capacity from 16mb to 50mb

ad45409

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Add Audio Input for Generating Q&A #82

Feature: Add Audio Input for Generating Q&A #82

IronJam11 commented Dec 19, 2024

Roaster05 commented Dec 20, 2024

IronJam11 commented Dec 23, 2024

Roaster05 commented Dec 30, 2024

IronJam11 commented Dec 30, 2024

IronJam11 commented Dec 30, 2024

IronJam11 commented Jan 2, 2025

Roaster05 commented Jan 2, 2025

Feature: Add Audio Input for Generating Q&A #82

Are you sure you want to change the base?

Feature: Add Audio Input for Generating Q&A #82

Conversation

IronJam11 commented Dec 19, 2024

Roaster05 commented Dec 20, 2024

IronJam11 commented Dec 23, 2024

Resolves Issue #34: Audio Transcription Functionality

What does this PR do?

Key Changes Made

Technical Details

How to Test

Attached Video

Roaster05 commented Dec 30, 2024

IronJam11 commented Dec 30, 2024

IronJam11 commented Dec 30, 2024

1. Windows

Manual Installation

Using Package Managers

2. Linux

Debian/Ubuntu

Fedora

Arch Linux

Snap (Universal for Linux):

3. macOS

Using Homebrew (Recommended)

Using MacPorts

IronJam11 commented Jan 2, 2025

Roaster05 commented Jan 2, 2025