VoxLogAI is an AI-powered web application designed for fast and accurate transcription of audio files and YouTube videos, as well as OCR (Optical Character Recognition) for images and PDFs, utilizing Google's powerful Gemini API. Get text from your audio and document content effortlessly.
Access the hosted version of the application here:
VoxLogAI offers a streamlined way to extract text from various media:
- Audio File Transcription: Upload and transcribe common audio formats (WAV, MP3, AIFF, AAC, OGG, FLAC).
- YouTube Video Transcription: Simply paste a YouTube URL to transcribe the video's audio content.
- Optional Timestamps: Include timestamps in your transcript to easily reference specific audio segments.
- Max Audio Size: Supports audio files up to 15MB (typically ~10-15 minutes, depending on quality).
- Image Text Extraction: Upload images (JPG, PNG, WEBP, HEIC) to extract contained text.
- PDF Text Extraction: Extract text from PDF documents with advanced OCR capabilities.
- Max Document Size: Supports image and PDF files up to 20MB.
- AI-Powered Accuracy: Leverages Google's advanced Gemini model for high-quality text extraction results.
- Privacy-Conscious: Your files are processed and are not permanently stored on the server.
- User-Friendly Interface: Clean, intuitive design with mode switching for different content types.
- Copy to Clipboard: Easily copy extracted text for use in other applications.
If you wish to run your own instance of VoxLogAI:
- Docker and Docker Compose
- A Google Gemini API Key (You can obtain one from Google AI Studio)
-
Clone this repository:
git clone https://github.com/antoniocascais/VoxLogAI.git cd VoxLogAI -
Create an environment file from the example:
cp .env.example .env
-
Edit the
.envfile and add your Google Gemini API key:GEMINI_API_KEY=your_gemini_api_key_here
Start the application using Docker Compose:
docker-compose up -dThe application will typically be available at http://localhost:5000 (or the port mapped in your docker-compose.yml).
For local development without Docker:
- Ensure you have Python 3.x installed.
- Install dependencies:
pip install -r requirements.txt
- Make sure your
GEMINI_API_KEYis set as an environment variable or available through the.envfile (you might needpython-dotenvinstalled and loaded inapp.pyif not already). - Run the Flask application:
python app.py
Your feedback is valuable! If you encounter any bugs, have suggestions for improvement, or would like to request a new feature:
- Please open an issue on the GitHub repository.
I'll review issues and consider them for future updates.
For direct inquiries or questions not suitable for a GitHub issue, you can reach out via email:

