EchoSynth 🎙️➡️📝🖼️

EchoSynth transforms audio content into refined text and visual assets through a coordinated team of AI agents. It's designed for podcasters, content creators, educators, and anyone working with spoken media who wants to automatically generate high-quality derivative content.

🌟 Features

Audio Transcription - Accurate text conversion from various audio formats
Speech Refinement - Transforms raw transcripts into polished speeches
Smart Summarization - Creates concise text summaries capturing key points
Image Generation - Produces relevant visuals that match content themes
Coordinated AI Agents - Specialized AI agents working together via CrewAI
Flexible Pipeline - Modular architecture that supports customization
JSON Output - Saves all generated content in structured JSON format

🛠️ Tech Stack

CrewAI - Agent orchestration framework
OpenAI Whisper - Speech-to-text transcription
OpenAI GPT-4o - Text processing and refinement
OpenAI DALL-E 3 - Image generation
Python 3.9+ - Core programming language

📋 Prerequisites

Python 3.9 or higher
OpenAI API key with access to Whisper, GPT-4, and DALL-E models
Audio files (.mp3, .mp4, .mpeg, .mpga, .m4a, .wav, or .webm)

🚀 Installation

Clone the repository:

git clone https://github.com/yourusername/EchoSynth.git
cd EchoSynth

Create and activate a virtual environment:

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```

Set up your environment variables:

export OPENAI_API_KEY="your_openai_api_key_here"

💻 Usage

Basic Example

from echo_synth.flows import EchoSynthFlow

# Initialize the flow with your audio file
flow = EchoSynthFlow(audio_file_path="data/your_audio_file.mp3")

# Run the processing pipeline
flow.run()

# Access the results
transcription = flow.state.transcribed_text
speech = flow.state.speech_text
summary = flow.state.summary_text
image_path = flow.state.image_file
results_json = flow.state.results_json

print(f"Processed {flow.audio_file_path}")
print(f"Generated image saved to: {image_path}")
print(f"Results JSON saved to: {results_json}")

Using Environment Variables

You can also set the audio file path via environment variables:

export AUDIO_FILE_PATH="data/your_audio_file.mp3"

from echo_synth.flows import EchoSynthFlow

# The audio path will be read from environment variables
flow = EchoSynthFlow()
flow.run()

🧠 Architecture

EchoSynth uses a multi-agent architecture powered by CrewAI:

Agent 1 (Text Transcribe) - Converts audio to accurate text using Whisper API
Agent 2 (Speech Writer) - Refines raw transcripts into polished, structured speech
Agent 3 (Summary for Image) - Creates descriptive content for image generation
Agent 4 (Summarizer) - Produces concise summaries of the key content
Agent 5 (Generate Image) - Creates visual representations using DALL-E
Agent 6 (Save data to JSON) - Compiles all outputs into a structured JSON file

The flow is coordinated through CrewAI's sequential pipeline, ensuring each agent receives the proper inputs from previous steps and all results are saved in a structured format.

Tools Used by Agents:

Tool 1 (Whisper STT) - Used by Agent 1 for transcription
Tool 2 (Sentiment Analysis) - Used by Agent 1 for audio content analysis
Tool 3 (DALL-E) - Used by Agent 5 for image generation
Tool 4 (FileWriter) - Used by Agent 6 to save outputs to JSON

🔍 Troubleshooting

Common Issues:

Audio File Not Found

Ensure your audio file exists and the path is correct. EchoSynth will search in:

The absolute path provided
The current working directory
A data/ subdirectory in the working directory

API Key Issues

Make sure your OPENAI_API_KEY environment variable is correctly set and has access to the required models.

File Size Limits

OpenAI's Whisper API has a 25MB file size limit. For larger files, consider splitting them or using a different method.

JSON Output

If you encounter issues with JSON output:

Check that all agent outputs are valid and complete
Ensure the FileWriter tool has proper permissions to write to the output directory
Verify the JSON structure matches your expected schema

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
knowledge		knowledge
src/echo_synth		src/echo_synth
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EchoSynth 🎙️➡️📝🖼️

🌟 Features

🛠️ Tech Stack

📋 Prerequisites

🚀 Installation

💻 Usage

Basic Example

Using Environment Variables

🧠 Architecture

Tools Used by Agents:

🔍 Troubleshooting

Common Issues:

Audio File Not Found

API Key Issues

File Size Limits

JSON Output

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

EchoSynth 🎙️➡️📝🖼️

🌟 Features

🛠️ Tech Stack

📋 Prerequisites

🚀 Installation

💻 Usage

Basic Example

Using Environment Variables

🧠 Architecture

Tools Used by Agents:

🔍 Troubleshooting

Common Issues:

Audio File Not Found

API Key Issues

File Size Limits

JSON Output

🤝 Contributing

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages