Integrate transcript timestamps into summary bullets#32
Conversation
Co-authored-by: greenbrettmichael <10648075+greenbrettmichael@users.noreply.github.com>
…cision Co-authored-by: greenbrettmichael <10648075+greenbrettmichael@users.noreply.github.com>
The MAX_TRANSCRIPT_SEGMENTS constant was removed to allow for unlimited transcript segments in the newsletter digest. The bullet point limit in the documentation was increased from 5 to 10 to accommodate more detailed content for complex videos.
There was a problem hiding this comment.
Pull request overview
This PR enhances the newsletter digest feature by integrating clickable timestamps from video transcripts into summary bullet points. Users can now click on timestamps in MM:SS format to jump directly to relevant moments in videos.
Key Changes
- Modified
get_recent_transcriptsto preserve structured transcript data with timestamps instead of concatenating text - Updated
generate_newsletter_digestto format timestamps in prompts and instruct the AI to generate MM:SS links - Added backward compatibility for legacy plain text transcript format
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
| app.py | Core implementation: preserves transcript timestamps in get_recent_transcripts and formats them for AI processing in generate_newsletter_digest with backward compatibility fallback |
| tests/test_transcripts.py | Updates test assertions to expect list-of-dict transcript format instead of concatenated strings |
| tests/test_openai_digest.py | Adds new tests for timestamp integration and backward compatibility with legacy string format |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| continue | ||
|
|
||
| results_data.append({"video_id": video_id, "title": title, "transcript": transcript_text}) | ||
| results_data.append({"video_id": video_id, "title": title, "transcript": transcript_items}) |
There was a problem hiding this comment.
The function docstring for get_recent_transcripts should be updated to reflect that the transcript field now contains structured data (list of dictionaries with 'text' and 'start' keys) rather than a plain text string. This is a breaking change in the return type that should be documented.
There was a problem hiding this comment.
@copilot please resolve
There was a problem hiding this comment.
Updated in commit 8352bfb. The docstring now documents that transcript is a list of dictionaries with 'text' (str) and 'start' (float) keys.
| transcript_formatted = "" | ||
| for segment in transcript_data: | ||
| timestamp_seconds = round(segment['start']) | ||
| transcript_formatted += f"[{timestamp_seconds}s] {segment['text']} " |
There was a problem hiding this comment.
The new transcript format lacks truncation that existed for the old format. When transcript_data is a list, the formatted transcript is built without any size limit, while the old format truncated at 25000 characters. This could cause context window overflow for OpenAI API calls when processing videos with very long transcripts. Consider adding a limit on the number of segments processed or truncating the transcript_formatted string.
| transcript_formatted += f"[{timestamp_seconds}s] {segment['text']} " | |
| transcript_formatted += f"[{timestamp_seconds}s] {segment['text']} " | |
| # Truncate to avoid overly long prompts, matching old-format behavior | |
| transcript_formatted = transcript_formatted[:25000] |
There was a problem hiding this comment.
Added truncation at 25000 characters in commit 8352bfb to match the old format behavior and prevent context overflow.
| ... (Provide between 2 and 5 bullet points. Use fewer for short/simple videos, and more for dense/complex technical content.) | ||
| - **[MM:SS](https://www.youtube.com/watch?v=<Video ID>&t=<seconds>s)** - <Bullet 1: Specific, actionable detail> | ||
| - **[MM:SS](https://www.youtube.com/watch?v=<Video ID>&t=<seconds>s)** - <Bullet 2: Specific, actionable detail> | ||
| ... (Provide between 2 and 10 bullet points. Use fewer for short/simple videos, and more for dense/complex technical content.) |
There was a problem hiding this comment.
The bullet point range was changed from "2 and 5" to "2 and 10" without apparent justification. This significantly increases the maximum number of bullets, which could lead to excessively long summaries. Consider whether this change aligns with the goal of "concise, high-value weekly digest" mentioned in the system prompt, or if it should be reverted to the original range.
| ... (Provide between 2 and 10 bullet points. Use fewer for short/simple videos, and more for dense/complex technical content.) | |
| ... (Provide between 2 and 5 bullet points. Use fewer for short/simple videos, and more for dense/complex technical content.) |
There was a problem hiding this comment.
Reverted to 2-5 bullet points in commit 8352bfb to maintain concise summaries aligned with the "concise, high-value weekly digest" goal.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…tore bullet count Co-authored-by: greenbrettmichael <10648075+greenbrettmichael@users.noreply.github.com>
Plan for Integrating Transcript Timestamps into Summary Bullets
get_recent_transcriptsto preserve timestamp data from transcriptsgenerate_newsletter_digestto pass timestamp data to OpenAI✅ Implementation Complete
All tasks completed successfully! The feature is ready for production use.
Summary of Changes
Core Functionality:
get_recent_transcripts()to preserve timestamp data from transcriptsgenerate_newsletter_digest()to format and pass timestamp data to OpenAICode Quality & Review Fixes:
4. Updated docstring to document new transcript format (list of dicts with 'text' and 'start')
5. Added robust error handling and validation for segment structure
6. Added type safety checks for backward compatibility with string transcripts
7. Added truncation (25000 chars) for new format to match old format behavior
8. Restored bullet point range to 2-5 (from 2-10) for concise summaries
Testing & Validation:
9. All 45 unit tests passing
10. Added specific tests for timestamp integration
11. Linting passes (ruff, flake8)
12. Type checking passes (mypy)
13. CodeQL security scan: 0 alerts
Security Summary
No security vulnerabilities detected in the changes.
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.