Skip to content

Conversation

eschutho
Copy link
Member

SUMMARY

When an alert or report fails, this PR adds the ability to capture a screenshot of the current state and include it in the error notification email sent to owners. This helps admins/owners quickly identify issues without needing to navigate to the dashboard or chart.

Key changes:

  • Modified send_error() in superset/commands/report/execute.py to attempt screenshot capture before sending error notifications
  • Updated email notification templates in superset/reports/notifications/email.py to include inline screenshot images
  • Added comprehensive unit tests for error emails with and without screenshots
  • Implemented best-effort approach: screenshots are captured if possible, but failures don't prevent error notifications from being sent

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

Before: Error emails only contained text describing the error

After: Error emails include a screenshot showing the state of the dashboard/chart at the time of failure (when screenshot capture is successful)

TESTING INSTRUCTIONS

  1. Set up an alert or report that will fail (e.g., configure it to run on a broken dashboard or chart)
  2. Trigger the alert/report execution
  3. Verify that:
    • Error notification email is sent to the owner
    • Email includes a screenshot of the dashboard/chart (if screenshot capture succeeded)
    • Email still sends even if screenshot capture fails
  4. Run unit tests: pytest tests/unit_tests/reports/notifications/email_tests.py -v

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

🤖 Generated with Claude Code

When an alert or report fails, capture a screenshot of the current state
and include it in the error notification email sent to owners. This helps
admins/owners quickly identify issues without needing to navigate to the
dashboard or chart.

Key changes:
- Modified send_error() to attempt screenshot capture before sending error notifications
- Updated email notification templates to include inline screenshot images
- Added comprehensive tests for error emails with and without screenshots
- Implemented best-effort approach: screenshots are captured if possible, but failures don't prevent error notifications from being sent

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Copy link

@korbit-ai korbit-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.
Category Issue Status
Performance Inefficient string concatenation in loop ▹ view ✅ Fix detected
Files scanned
File Path Reviewed
superset/reports/notifications/email.py
superset/commands/report/execute.py

Explore our documentation to understand the languages and file types we support and the files we ignore.

Check out our docs on how you can make Korbit work best for you and your team.

Loving Korbit!? Share us on LinkedIn Reddit and X

Comment on lines 126 to 129
for msgid in images.keys():
img_tag_str += (
f'<div class="image"><img width="1000" src="cid:{msgid}"></div>'
)

This comment was marked as resolved.

@eschutho eschutho marked this pull request as draft October 15, 2025 00:57
Modify Selenium webdriver to capture screenshots of whatever state exists
even when timeouts occur. This is especially useful for error notifications
where we want to show admins the state of the page at the time of failure,
even if the page didn't fully load.

Changes:
- TimeoutExceptions no longer immediately raise - instead we try to capture the current page state
- If the target element isn't found, fall back to full page screenshot
- Improved logging to distinguish between successful loads and partial captures

This only affects Selenium (not Playwright) as Selenium maintains browser state
after timeouts while Playwright does not.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
eschutho and others added 4 commits October 16, 2025 13:47
Don't retry screenshot capture in send_error() since we may not capture
the same state. Selenium webdriver now handles partial screenshot
capture on timeout automatically.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
When Selenium timeouts occur during screenshot capture, we now properly
raise an error with the partial screenshot attached. This ensures:
- Timeout failures are logged as errors in execution logs
- Error notifications are sent to owners
- Partial screenshots are included in error emails

Changes:
- ReportScheduleScreenshotTimeout now accepts and stores screenshot data
- Selenium webdriver tracks timeouts and raises exception with screenshots
- Error handlers extract screenshots from timeout exceptions
- send_error() accepts screenshots parameter to include in notifications

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Add detailed diagnostic logging when chart containers fail to render:
- Log number of chart/grid/loading elements found
- Extract and log chart identifiers for debugging
- Include DOM preview (first 2000 chars) to help diagnose issues

This telemetry helps identify which specific charts are causing
timeouts and what state the page is in when timeouts occur.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Address korbit-ai review feedback - replace O(n²) string concatenation
with O(n) list append and join pattern for better performance.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
eschutho and others added 5 commits October 17, 2025 11:35
…rror emails

When a dashboard/chart times out during screenshot capture:
- Display how long the page was loading before timeout
- Show a screenshot of the partial rendering
- Provide actionable performance recommendations:
  - Optimize queries (indexes, reduce data, simplify)
  - Enable and utilize cached data
  - Break complex dashboards into smaller views
  - Review database performance

The elapsed time is tracked from the start of screenshot capture
and passed through the exception chain to the error notification.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Improve code readability by breaking long HTML strings in error template
across multiple lines. HTML rendering is unaffected by whitespace in the
source strings.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Include elapsed time in all internal timeout warning logs to help
with debugging performance issues. This complements the timing info
added to error emails.

Changes:
- Element location timeout: Shows elapsed time when element not found
- Chart container timeout: Shows elapsed time in diagnostic logs
- Chart loading timeout: Shows elapsed time when charts don't finish loading

Example log: "Selenium timed out waiting for charts to load at url
http://example.com/dashboard/1 after 15.32 seconds"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Update test_get_screenshot_logs_chart_timeout_details to mock time()
function and add required config keys. The test now properly validates
that elapsed time is included in timeout warning logs.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
… to exception

- Move time import to module level in webdriver.py for proper test mocking
- Fix test mock setup for screenshot_as_png property
- Update test assertions to check lazy-formatted log messages with chart IDs
- Add working_timeout parameter to ReportSchedulePreviousWorkingError exception
- Pass working_timeout value when raising the exception

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants