Podcast RSS to Text Summary

A Streamlit application that allows users to:

Input podcast RSS feed URLs
Convert podcast audio to text using WhisperX
Generate summaries using OpenAI's GPT models with tiktoken
Save and browse podcast transcripts and summaries

Features

RSS Feed Parsing: Parse podcast RSS feeds to extract episode information
Audio Download: Download podcast episodes from RSS feeds
Voice-to-Text Conversion: Transcribe podcast audio using WhisperX
Text Summarization: Generate summaries of podcast transcripts using OpenAI's GPT models
Storage: Save podcast transcripts and summaries for later viewing
User Interface: Browse and view saved podcasts

Requirements

Python 3.8+
FFmpeg (for audio processing)
CUDA-compatible GPU (optional, for faster transcription)
OpenAI API key

Installation

Clone the repository:

git clone <repository-url>
cd podcast-rss-to-text

Install required packages:

pip install -r requirements.txt

Install WhisperX:

pip install git+https://github.com/m-bain/whisperx.git

Install FFmpeg:
- Windows: Download from ffmpeg.org and add to PATH
- Mac: brew install ffmpeg
- Linux: sudo apt install ffmpeg
Copy the .env.example file to .env and add your OpenAI API key:

cp .env.example .env

Usage

Start the Streamlit application:

streamlit run main.py

Open your browser and navigate to the URL displayed in the terminal (usually http://localhost:8501)
Navigate to "Add New Podcast" and enter a podcast RSS feed URL
Select an episode to process and click "Process Episode"
View the generated transcript and summary
Access saved podcasts through the "Saved Podcasts" page

Project Structure

main.py: Main Streamlit application
db_manager.py: SQLite database manager for podcast data
podcast_parser.py: RSS feed parser and audio download functionality
transcription.py: Audio transcription using WhisperX
summary_generator.py: Text summarization using OpenAI API and tiktoken

Notes

Transcription can take some time depending on the length of the audio and your hardware
The application creates the following directories:
- audio_files: Downloaded and converted audio files
- transcriptions: Text transcripts of podcasts

Troubleshooting

WhisperX installation issues: Make sure you have the correct CUDA version installed for your GPU
Audio conversion errors: Check that FFmpeg is properly installed and accessible in your PATH
OpenAI API errors: Verify your API key is correct and has sufficient credits

License

MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Podcast RSS to Text Summary

Features

Requirements

Installation

Usage

Project Structure

Notes

Troubleshooting

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
db_manager.py		db_manager.py
main.py		main.py
podcast_parser.py		podcast_parser.py
requirements.txt		requirements.txt
setup.py		setup.py
summary_generator.py		summary_generator.py
transcription.py		transcription.py

Folders and files

Latest commit

History

Repository files navigation

Podcast RSS to Text Summary

Features

Requirements

Installation

Usage

Project Structure

Notes

Troubleshooting

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages