Skip to content

raghav-potdar/Podcast-Summarizer-RSS-Feed

Repository files navigation

Podcast RSS to Text Summary

A Streamlit application that allows users to:

  1. Input podcast RSS feed URLs
  2. Convert podcast audio to text using WhisperX
  3. Generate summaries using OpenAI's GPT models with tiktoken
  4. Save and browse podcast transcripts and summaries

Features

  • RSS Feed Parsing: Parse podcast RSS feeds to extract episode information
  • Audio Download: Download podcast episodes from RSS feeds
  • Voice-to-Text Conversion: Transcribe podcast audio using WhisperX
  • Text Summarization: Generate summaries of podcast transcripts using OpenAI's GPT models
  • Storage: Save podcast transcripts and summaries for later viewing
  • User Interface: Browse and view saved podcasts

Requirements

  • Python 3.8+
  • FFmpeg (for audio processing)
  • CUDA-compatible GPU (optional, for faster transcription)
  • OpenAI API key

Installation

  1. Clone the repository:
git clone <repository-url>
cd podcast-rss-to-text
  1. Install required packages:
pip install -r requirements.txt
  1. Install WhisperX:
pip install git+https://github.com/m-bain/whisperx.git
  1. Install FFmpeg:

    • Windows: Download from ffmpeg.org and add to PATH
    • Mac: brew install ffmpeg
    • Linux: sudo apt install ffmpeg
  2. Copy the .env.example file to .env and add your OpenAI API key:

cp .env.example .env

Usage

  1. Start the Streamlit application:
streamlit run main.py
  1. Open your browser and navigate to the URL displayed in the terminal (usually http://localhost:8501)

  2. Navigate to "Add New Podcast" and enter a podcast RSS feed URL

  3. Select an episode to process and click "Process Episode"

  4. View the generated transcript and summary

  5. Access saved podcasts through the "Saved Podcasts" page

Project Structure

  • main.py: Main Streamlit application
  • db_manager.py: SQLite database manager for podcast data
  • podcast_parser.py: RSS feed parser and audio download functionality
  • transcription.py: Audio transcription using WhisperX
  • summary_generator.py: Text summarization using OpenAI API and tiktoken

Notes

  • Transcription can take some time depending on the length of the audio and your hardware
  • The application creates the following directories:
    • audio_files: Downloaded and converted audio files
    • transcriptions: Text transcripts of podcasts

Troubleshooting

  • WhisperX installation issues: Make sure you have the correct CUDA version installed for your GPU
  • Audio conversion errors: Check that FFmpeg is properly installed and accessible in your PATH
  • OpenAI API errors: Verify your API key is correct and has sufficient credits

License

MIT License

About

A Streamlit-powered tool that fetches podcast episodes via RSS, transcribes audio with WhisperX, and generates concise summaries using OpenAI. Features include a local SQLite database for history and token-optimized processing with tiktoken.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages