Skip to content
This repository was archived by the owner on Apr 8, 2026. It is now read-only.

Integrate transcript timestamps into summary bullets#32

Merged
greenbrettmichael merged 6 commits into
mainfrom
copilot/integrate-transcript-timestamps
Dec 30, 2025
Merged

Integrate transcript timestamps into summary bullets#32
greenbrettmichael merged 6 commits into
mainfrom
copilot/integrate-transcript-timestamps

Conversation

Copilot AI commented Dec 27, 2025

Copy link
Copy Markdown
Contributor

Plan for Integrating Transcript Timestamps into Summary Bullets

  • Explore repository structure and understand current implementation
  • Verify existing tests pass
  • Modify get_recent_transcripts to preserve timestamp data from transcripts
  • Update generate_newsletter_digest to pass timestamp data to OpenAI
  • Modify OpenAI prompt to instruct inclusion of timestamps in bullet points
  • Format timestamps as clickable YouTube links (format: mm:ss)
  • Create tests for timestamp integration
  • Run tests to validate changes
  • Fix linting issues
  • Manually verify the output format
  • Request code review
  • Address review feedback
  • Run CodeQL security check
  • Address PR review comments

✅ Implementation Complete

All tasks completed successfully! The feature is ready for production use.

Summary of Changes

Core Functionality:

  1. Modified get_recent_transcripts() to preserve timestamp data from transcripts
  2. Updated generate_newsletter_digest() to format and pass timestamp data to OpenAI
  3. Enhanced OpenAI prompt with explicit instructions for timestamp formatting

Code Quality & Review Fixes:
4. Updated docstring to document new transcript format (list of dicts with 'text' and 'start')
5. Added robust error handling and validation for segment structure
6. Added type safety checks for backward compatibility with string transcripts
7. Added truncation (25000 chars) for new format to match old format behavior
8. Restored bullet point range to 2-5 (from 2-10) for concise summaries

Testing & Validation:
9. All 45 unit tests passing
10. Added specific tests for timestamp integration
11. Linting passes (ruff, flake8)
12. Type checking passes (mypy)
13. CodeQL security scan: 0 alerts

Security Summary

No security vulnerabilities detected in the changes.

Original prompt

This section details on the original issue you should resolve

<issue_title>Integrate Transcript Timestamps into Summary Bullets</issue_title>
<issue_description>Objective:
Enhance the summary feature by integrating timestamps from the video transcripts into each summary bullet. This will allow users to jump directly to the relevant point in the video from the summary.

Details:

  • Parse the transcript to extract accurate timestamps for each section or bullet.
  • Update the summary generation logic to include a clickable or clearly displayed timestamp alongside every bullet point.
  • Ensure the timestamps are formatted consistently (e.g., mm:ss) and are accurate.
  • Consider UX/UI implications for both web and text versions of the summary.

Benefit:
This will make summaries more actionable and user-friendly.

Acceptance Criteria:

  • Each summary bullet contains an associated timestamp.
  • Users can use the timestamps to jump to that part in the video (if supported).
  • The feature passes relevant tests and QA review.

Additional Context:

  • Coordinate with existing summary and transcript extraction logic.
  • May require modifications to summary data models and UI components.
    </issue_description>

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits December 27, 2025 03:25
Co-authored-by: greenbrettmichael <10648075+greenbrettmichael@users.noreply.github.com>
…cision

Co-authored-by: greenbrettmichael <10648075+greenbrettmichael@users.noreply.github.com>
Copilot AI changed the title [WIP] Add timestamps to summary bullets from transcripts Integrate transcript timestamps into summary bullets Dec 27, 2025
The MAX_TRANSCRIPT_SEGMENTS constant was removed to allow for
unlimited transcript segments in the newsletter digest. The bullet
point limit in the documentation was increased from 5 to 10 to
accommodate more detailed content for complex videos.
@greenbrettmichael greenbrettmichael marked this pull request as ready for review December 30, 2025 12:00
Copilot AI review requested due to automatic review settings December 30, 2025 12:00

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the newsletter digest feature by integrating clickable timestamps from video transcripts into summary bullet points. Users can now click on timestamps in MM:SS format to jump directly to relevant moments in videos.

Key Changes

  • Modified get_recent_transcripts to preserve structured transcript data with timestamps instead of concatenating text
  • Updated generate_newsletter_digest to format timestamps in prompts and instruct the AI to generate MM:SS links
  • Added backward compatibility for legacy plain text transcript format

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

File Description
app.py Core implementation: preserves transcript timestamps in get_recent_transcripts and formats them for AI processing in generate_newsletter_digest with backward compatibility fallback
tests/test_transcripts.py Updates test assertions to expect list-of-dict transcript format instead of concatenated strings
tests/test_openai_digest.py Adds new tests for timestamp integration and backward compatibility with legacy string format

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread app.py
continue

results_data.append({"video_id": video_id, "title": title, "transcript": transcript_text})
results_data.append({"video_id": video_id, "title": title, "transcript": transcript_items})

Copilot AI Dec 30, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function docstring for get_recent_transcripts should be updated to reflect that the transcript field now contains structured data (list of dictionaries with 'text' and 'start' keys) rather than a plain text string. This is a breaking change in the return type that should be documented.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot please resolve

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in commit 8352bfb. The docstring now documents that transcript is a list of dictionaries with 'text' (str) and 'start' (float) keys.

Comment thread tests/test_openai_digest.py Outdated
Comment thread app.py Outdated
Comment thread app.py Outdated
Comment thread app.py Outdated
Comment thread app.py Outdated
transcript_formatted = ""
for segment in transcript_data:
timestamp_seconds = round(segment['start'])
transcript_formatted += f"[{timestamp_seconds}s] {segment['text']} "

Copilot AI Dec 30, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new transcript format lacks truncation that existed for the old format. When transcript_data is a list, the formatted transcript is built without any size limit, while the old format truncated at 25000 characters. This could cause context window overflow for OpenAI API calls when processing videos with very long transcripts. Consider adding a limit on the number of segments processed or truncating the transcript_formatted string.

Suggested change
transcript_formatted += f"[{timestamp_seconds}s] {segment['text']} "
transcript_formatted += f"[{timestamp_seconds}s] {segment['text']} "
# Truncate to avoid overly long prompts, matching old-format behavior
transcript_formatted = transcript_formatted[:25000]

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added truncation at 25000 characters in commit 8352bfb to match the old format behavior and prevent context overflow.

Comment thread app.py Outdated
... (Provide between 2 and 5 bullet points. Use fewer for short/simple videos, and more for dense/complex technical content.)
- **[MM:SS](https://www.youtube.com/watch?v=<Video ID>&t=<seconds>s)** - <Bullet 1: Specific, actionable detail>
- **[MM:SS](https://www.youtube.com/watch?v=<Video ID>&t=<seconds>s)** - <Bullet 2: Specific, actionable detail>
... (Provide between 2 and 10 bullet points. Use fewer for short/simple videos, and more for dense/complex technical content.)

Copilot AI Dec 30, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bullet point range was changed from "2 and 5" to "2 and 10" without apparent justification. This significantly increases the maximum number of bullets, which could lead to excessively long summaries. Consider whether this change aligns with the goal of "concise, high-value weekly digest" mentioned in the system prompt, or if it should be reverted to the original range.

Suggested change
... (Provide between 2 and 10 bullet points. Use fewer for short/simple videos, and more for dense/complex technical content.)
... (Provide between 2 and 5 bullet points. Use fewer for short/simple videos, and more for dense/complex technical content.)

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted to 2-5 bullet points in commit 8352bfb to maintain concise summaries aligned with the "concise, high-value weekly digest" goal.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…tore bullet count

Co-authored-by: greenbrettmichael <10648075+greenbrettmichael@users.noreply.github.com>
@greenbrettmichael greenbrettmichael merged commit 3eb76f4 into main Dec 30, 2025
5 checks passed
@greenbrettmichael greenbrettmichael deleted the copilot/integrate-transcript-timestamps branch December 30, 2025 12:24
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Integrate Transcript Timestamps into Summary Bullets

3 participants