Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

App should automatically split audio files >25 MB and transcribe each part #8

Open
Bklieger opened this issue Jun 23, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@Bklieger
Copy link
Owner

Currently, the app can only handle audio files that are less than 25 MB. This is a limitation due to Whisper's max input file size of 25 MB. However, we can get around this limitation by splitting audio files greater than 25 MB into several files which can each be transcribed by the API. Then, the results can be combined into one transcript.

It should be noted that I believe there still needs to be an upper limit on the file size to preserve Whisper API cost. In addition, if the transcript becomes too large (# of tokens), then Groq API rate limits may cause errors on Groq API calls. There should be a check on this size as well.

@Bklieger Bklieger added the enhancement New feature or request label Jun 23, 2024
Bklieger referenced this issue Jun 23, 2024
Split audio files greater than 25 MB into several files which are each transcribed by the API. The results are combined into one transcript. An upper limit of 100 MB is applied.
@Bklieger
Copy link
Owner Author

@MentatBot

Currently, the app can only handle audio files that are less than 25 MB. This is a limitation due to Whisper's max input file size of 25 MB. However, we can get around this limitation by splitting audio files greater than 25 MB into several files which can each be transcribed by the API. Then, the results can be combined into one transcript.

It should be noted that I believe there still needs to be an upper limit on the file size to preserve Whisper API cost. In addition, if the transcript becomes too large (# of tokens), then Groq API rate limits may cause errors on Groq API calls. There should be a check on this size as well.

Copy link

mentatbot bot commented Jun 27, 2024

I will start working on this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant