-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Added example Podcast_and_Audio_Transcription #665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Adds automated audio transcription using Gemini 2.0 with: ✅ Speaker identification (labeled or as Speaker A/B) ✅ Precision timestamps ([HH:MM:SS]) ✅ Music/sound effect detection (e.g., [Jingle] or [Song Name]) ✅ Clean text output with [END] marker Testing: Verified with podcasts & call recordings. Deps: jinja2, Gemini API client. Useful for podcasts, interviews, and call analysis.
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Thanks @SonjeVilas, that's an interesting example. I won't have time to review it today but I'll try to do it next week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @SonjeVilas,
That's a nice example. On top of what @andycandy already reported, and the minor stuff I pointed out, can you:
- move the notebook in the
examples/
directory - add a link to it in the examples' README
- add a "what's next" section at the end of the notebook, pointing to similar notebooks (or just you preferred ones).
- run the formatting script (cf. https://github.com/google-gemini/cookbook/actions/runs/14262332934/job/39989516304?pr=665)
Thanks again!
@Giom-V Thanks For the Review... :) |
@nikitamaia Thanks for Review :) |
@@ -0,0 +1,531 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #4. file_path = "https://storage.googleapis.com/generativeai-downloads/data/State_of_the_Union_Address_30_January_1961.mp3"
I think we need to find a better example with 2 speakers to showcase the diarization. What about something like https://archive.org/details/Apollo11Audio (not the whole recording but a specific part). They also have some open-sourced podcasts I think.
In any case, whatever the source of your audio file If you do, don't forget to cite where it comes from.
Reply via ReviewNB
Hello @SonjeVilas, do you still want to push that example? |
Thanks for reminder ! I will complete this PR on this weekend. |
Adds automated audio transcription using Gemini 2.0 with:
✅ Speaker identification (labeled or as Speaker A/B)
✅ Precision timestamps ([HH:MM:SS])
✅ Music/sound effect detection (e.g., [Jingle] or [Song Name])
✅ Clean text output with [END] marker
Useful for podcasts, interviews, and call analysis.