Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Add Audio Input for Generating Q&A #82

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

IronJam11
Copy link

Resolves Issue #34

What does this PR do?
Resolves audio transcription functionality issues by implementing a robust Flask-based transcription service that handles multiple audio/video formats.

Changes Made:

Implemented unified audio conversion pipeline using FFmpeg
Added support for multiple formats including MP3, WAV, OGG, M4A, MP4, AVI, MOV, MKV, WEBM, AAC
Enhanced error handling and logging throughout the application
Added file validation and security measures (secure filenames, size limits)
Implemented automatic cleanup of temporary files
Integrated Google Speech Recognition with optimized settings
Added proper CORS support for cross-origin requests

Technical Details:

Uses FFmpeg for audio/video processing
Leverages Google Speech Recognition API for transcription
Standardizes audio conversion to 16kHz mono WAV format
Implements file size limit of 16MB
Uses Werkzeug security features for filename handling
Includes comprehensive logging system

How to Test:

Install required dependencies:
bashCopypip install flask flask-cors SpeechRecognition ffmpeg-python werkzeug
Install FFmpeg on your system
Run the Flask application
Send a POST request to /upload with any supported audio/video file
Verify that you receive a JSON response with the transcription

Attached Video:
https://drive.google.com/file/d/1K0r7-J2cgbfU4AhBG87kxVZl3BzUGdUl/view?usp=sharing

Related Isuue:
#34

@Roaster05
Copy link
Contributor

@IronJam11 could you provide some more details about the implementation? , also please improve the PR message

@IronJam11
Copy link
Author

Resolves Issue #34: Audio Transcription Functionality

What does this PR do?

This PR addresses and resolves issues with the audio transcription functionality by implementing a robust Flask-based transcription service. It supports multiple audio/video formats, enhances security, and optimizes processing.


Key Changes Made

  • Audio Conversion Pipeline:

    • Unified pipeline using FFmpeg for seamless audio/video conversion.
    • Standardized conversion to 16kHz mono WAV format for optimal transcription accuracy.
  • Format Support:

    • Added support for MP3, WAV, OGG, M4A, MP4, AVI, MOV, MKV, WEBM, AAC formats.
  • Error Handling & Logging:

    • Enhanced error handling for robust failure management.
    • Integrated comprehensive logging to track application behavior.
  • Security Enhancements:

    • Implemented file validation, secure filenames (via Werkzeug), and file size limits (16MB).
    • Automatic cleanup of temporary files to prevent resource leaks.
  • Transcription Integration:

    • Leveraged Google Speech Recognition API with optimized settings for improved accuracy.

Technical Details

  • Audio/Video Processing: Utilizes FFmpeg for format conversion and preprocessing.
  • Transcription: Integrates Google Speech Recognition for accurate and efficient transcription.
  • File Handling:
    • Enforces file size limit (16MB).
    • Uses secure filename handling via Werkzeug.
    • Automatically cleans up temporary files post-processing.

How to Test

  1. Install required dependencies:
    pip install flask flask-cors SpeechRecognition ffmpeg-python werkzeug
  2. Install FFmpeg on your system.
  3. Run the Flask application.
  4. Send a POST request to /upload with any supported audio/video file.
  5. Verify the JSON response containing the transcription.

Attached Video

Demonstration of functionality: [Link to Video](https://drive.google.com/file/d/1K0r7-J2cgbfU4AhBG87kxVZl3BzUGdUl/view?usp=sharing)


Let me know if you would like me to explain more on this @Roaster05

@Roaster05
Copy link
Contributor

@IronJam11 i have 2 concerns try to find a work around for them

  • could you find some alternative for installing the FFmpeg in the local system to make this functionality work
  • The size limit of 16 Mb is way too less for videos so can we increase that

@IronJam11
Copy link
Author

i will make the changes and add the necessary details until midnight

@IronJam11
Copy link
Author

I have increased the capacity to 50 Mb, and in order to install FFmpeg you will need to add to your path variables.
Here’s a step-by-step guide to download and install FFmpeg on Windows, Linux, and macOS.


1. Windows

Manual Installation

  1. Download FFmpeg:

  2. Extract Files:

    • Download the ZIP file and extract it to a directory, e.g., C:\ffmpeg.
  3. Add FFmpeg to PATH:

    • Open the Start Menu, search for Environment Variables, and click on it.
    • Under System Variables, find Path, and click Edit.
    • Add the path to the bin directory, e.g., C:\ffmpeg\bin.
    • Click OK to save.
  4. Verify Installation:

    • Open Command Prompt and run:
      ffmpeg -version

Using Package Managers

  • Chocolatey:
    choco install ffmpeg
  • Scoop:
    scoop install ffmpeg

2. Linux

Debian/Ubuntu

  1. Update the package list:
    sudo apt update
  2. Install FFmpeg:
    sudo apt install ffmpeg
  3. Verify Installation:
    ffmpeg -version

Fedora

  1. Install FFmpeg:
    sudo dnf install ffmpeg
  2. Verify Installation:
    ffmpeg -version

Arch Linux

  1. Install FFmpeg:
    sudo pacman -S ffmpeg
  2. Verify Installation:
    ffmpeg -version

Snap (Universal for Linux):

  1. Install FFmpeg:
    sudo snap install ffmpeg
  2. Verify Installation:
    ffmpeg -version

3. macOS

Using Homebrew (Recommended)

  1. Install Homebrew (if not already installed):
    /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
  2. Install FFmpeg:
    brew install ffmpeg
  3. Verify Installation:
    ffmpeg -version

Using MacPorts

  1. Install MacPorts:
    sudo port selfupdate
  2. Install FFmpeg:
    sudo port install ffmpeg
  3. Verify Installation:
    ffmpeg -version

@Roaster05

@IronJam11
Copy link
Author

@Roaster05 ???

@Roaster05
Copy link
Contributor

@Roaster05
@IronJam11 i meant you to find an alternative for FFmpeg rather than giving an alternative method to install it, we need some way to work it on server side itself, also pls dont tag multiple times i usually reply in few days

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants