EchoSynth transforms audio content into refined text and visual assets through a coordinated team of AI agents. It's designed for podcasters, content creators, educators, and anyone working with spoken media who wants to automatically generate high-quality derivative content.
- Audio Transcription - Accurate text conversion from various audio formats
- Speech Refinement - Transforms raw transcripts into polished speeches
- Smart Summarization - Creates concise text summaries capturing key points
- Image Generation - Produces relevant visuals that match content themes
- Coordinated AI Agents - Specialized AI agents working together via CrewAI
- Flexible Pipeline - Modular architecture that supports customization
- JSON Output - Saves all generated content in structured JSON format
- CrewAI - Agent orchestration framework
- OpenAI Whisper - Speech-to-text transcription
- OpenAI GPT-4o - Text processing and refinement
- OpenAI DALL-E 3 - Image generation
- Python 3.9+ - Core programming language
- Python 3.9 or higher
- OpenAI API key with access to Whisper, GPT-4, and DALL-E models
- Audio files (.mp3, .mp4, .mpeg, .mpga, .m4a, .wav, or .webm)
-
Clone the repository:
git clone https://github.com/yourusername/EchoSynth.git cd EchoSynth -
Create and activate a virtual environment:
python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Set up your environment variables:
export OPENAI_API_KEY="your_openai_api_key_here"
from echo_synth.flows import EchoSynthFlow
# Initialize the flow with your audio file
flow = EchoSynthFlow(audio_file_path="data/your_audio_file.mp3")
# Run the processing pipeline
flow.run()
# Access the results
transcription = flow.state.transcribed_text
speech = flow.state.speech_text
summary = flow.state.summary_text
image_path = flow.state.image_file
results_json = flow.state.results_json
print(f"Processed {flow.audio_file_path}")
print(f"Generated image saved to: {image_path}")
print(f"Results JSON saved to: {results_json}")You can also set the audio file path via environment variables:
export AUDIO_FILE_PATH="data/your_audio_file.mp3"from echo_synth.flows import EchoSynthFlow
# The audio path will be read from environment variables
flow = EchoSynthFlow()
flow.run()EchoSynth uses a multi-agent architecture powered by CrewAI:
- Agent 1 (Text Transcribe) - Converts audio to accurate text using Whisper API
- Agent 2 (Speech Writer) - Refines raw transcripts into polished, structured speech
- Agent 3 (Summary for Image) - Creates descriptive content for image generation
- Agent 4 (Summarizer) - Produces concise summaries of the key content
- Agent 5 (Generate Image) - Creates visual representations using DALL-E
- Agent 6 (Save data to JSON) - Compiles all outputs into a structured JSON file
The flow is coordinated through CrewAI's sequential pipeline, ensuring each agent receives the proper inputs from previous steps and all results are saved in a structured format.
- Tool 1 (Whisper STT) - Used by Agent 1 for transcription
- Tool 2 (Sentiment Analysis) - Used by Agent 1 for audio content analysis
- Tool 3 (DALL-E) - Used by Agent 5 for image generation
- Tool 4 (FileWriter) - Used by Agent 6 to save outputs to JSON
Ensure your audio file exists and the path is correct. EchoSynth will search in:
- The absolute path provided
- The current working directory
- A
data/subdirectory in the working directory
Make sure your OPENAI_API_KEY environment variable is correctly set and has access to the required models.
OpenAI's Whisper API has a 25MB file size limit. For larger files, consider splitting them or using a different method.
If you encounter issues with JSON output:
- Check that all agent outputs are valid and complete
- Ensure the FileWriter tool has proper permissions to write to the output directory
- Verify the JSON structure matches your expected schema
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
