-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
Current Behavior
The audio transcription feature in Fabric currently processes audio chunks sequentially, which creates a significant performance bottleneck when transcribing large audio files.
When an audio file exceeds the 25MB limit, it is split into multiple chunks using ffmpeg, but these chunks are then processed one at a time in a for loop. This means:
- Each chunk must wait for the previous chunk to complete transcription before starting
- Network latency and API processing time compound linearly with the number of chunks
- For a file split into 30 chunks (~ 40 min waw audio), the total time is roughly 30x the time for a single chunk
Proposal
Implement parallel processing of audio chunks to significantly reduce transcription time for large files. Multiple chunks should be transcribed concurrently, with results assembled in the correct order.
This would dramatically speed up the process.
Implementation Considerations
Concurrency Control
- Configurable parameter for max concurrent transcriptions (e.g.,
--max-concurrentflag) - Default to a reasonable limit (e.g., 3-5 concurrent requests)
API Rate Limits
- Respect OpenAI API rate limits to avoid 429 errors
- Consider implementing exponential backoff for retries
Technical Details
I assume the main change would have to be in the TranscribeFile function in internal/plugins/ai/openai/openai_audio.go where chunks are currently processed sequentially. The loop structure would need to be refactored to parallelize (using goroutines?).