A Streamlit application that allows users to:
- Input podcast RSS feed URLs
- Convert podcast audio to text using WhisperX
- Generate summaries using OpenAI's GPT models with tiktoken
- Save and browse podcast transcripts and summaries
- RSS Feed Parsing: Parse podcast RSS feeds to extract episode information
- Audio Download: Download podcast episodes from RSS feeds
- Voice-to-Text Conversion: Transcribe podcast audio using WhisperX
- Text Summarization: Generate summaries of podcast transcripts using OpenAI's GPT models
- Storage: Save podcast transcripts and summaries for later viewing
- User Interface: Browse and view saved podcasts
- Python 3.8+
- FFmpeg (for audio processing)
- CUDA-compatible GPU (optional, for faster transcription)
- OpenAI API key
- Clone the repository:
git clone <repository-url>
cd podcast-rss-to-text
- Install required packages:
pip install -r requirements.txt
- Install WhisperX:
pip install git+https://github.com/m-bain/whisperx.git
-
Install FFmpeg:
- Windows: Download from ffmpeg.org and add to PATH
- Mac:
brew install ffmpeg - Linux:
sudo apt install ffmpeg
-
Copy the
.env.examplefile to.envand add your OpenAI API key:
cp .env.example .env
- Start the Streamlit application:
streamlit run main.py
-
Open your browser and navigate to the URL displayed in the terminal (usually http://localhost:8501)
-
Navigate to "Add New Podcast" and enter a podcast RSS feed URL
-
Select an episode to process and click "Process Episode"
-
View the generated transcript and summary
-
Access saved podcasts through the "Saved Podcasts" page
main.py: Main Streamlit applicationdb_manager.py: SQLite database manager for podcast datapodcast_parser.py: RSS feed parser and audio download functionalitytranscription.py: Audio transcription using WhisperXsummary_generator.py: Text summarization using OpenAI API and tiktoken
- Transcription can take some time depending on the length of the audio and your hardware
- The application creates the following directories:
audio_files: Downloaded and converted audio filestranscriptions: Text transcripts of podcasts
- WhisperX installation issues: Make sure you have the correct CUDA version installed for your GPU
- Audio conversion errors: Check that FFmpeg is properly installed and accessible in your PATH
- OpenAI API errors: Verify your API key is correct and has sufficient credits