Skip to content

Conversation

@hc2p
Copy link

@hc2p hc2p commented Nov 14, 2025

Summary

Adds comprehensive execution tracking and optional timeout monitoring for Jupyter notebook cells to help debug stuck executions and improve observability.

This PR builds on #31 (enhanced logging) and adds execution-level tracking on top of the logging infrastructure.

Features

1. Execution Tracking (Always On)

  • Tracks all cell executions with start/end events
  • Logs execution duration and success status
  • Publishes metadata to webapp via MIME types
  • No performance impact on normal execution

2. Execution Metadata Publishing

  • Publishes structured metadata via `application/vnd.deepnote.execution-metadata+json`
  • Includes: execution_count, duration, success status, error type, timestamp
  • Enables webapp to display execution metrics in real-time

3. Timeout Monitoring (Opt-in)

  • Optional monitoring with configurable thresholds
  • Sends warnings for long-running executions
  • Can optionally auto-interrupt stuck executions
  • Thread-safe implementation with proper locking

Environment Variables

`DEEPNOTE_ENABLE_EXECUTION_TIMEOUT`

  • Default: `false`
  • When enabled: Activates timeout monitoring
  • Use case: Detecting and handling stuck executions

`DEEPNOTE_EXECUTION_WARNING_THRESHOLD`

  • Default: `240` (4 minutes)
  • What it does: Seconds after which to send a warning
  • Requires: `DEEPNOTE_ENABLE_EXECUTION_TIMEOUT=true`

`DEEPNOTE_EXECUTION_TIMEOUT_THRESHOLD`

  • Default: `300` (5 minutes)
  • What it does: Seconds after which execution is considered stuck
  • Requires: `DEEPNOTE_ENABLE_EXECUTION_TIMEOUT=true`

`DEEPNOTE_EXECUTION_AUTO_INTERRUPT`

  • Default: `false`
  • What it does: Automatically send SIGINT to interrupt stuck executions
  • Warning: Use with caution! This will terminate executions.

Implementation Details

Thread Safety

  • All shared state protected with `threading.Lock()`
  • Timer callbacks copy data under lock, process outside lock
  • Prevents both race conditions and deadlocks

Production Safety

  • Execution tracking has minimal overhead
  • Timeout monitoring is opt-in
  • Auto-interrupt is disabled by default
  • No performance impact when features are disabled

Testing

To test execution tracking:
```python

Execute a cell and check logs

print("Hello, world!")
```

Check logs for:
```
EXEC_START | count=1 | cell_id=... | preview=print("Hello, world!")
EXEC_END | count=1 | duration=0.01s | success=True
```

To test timeout monitoring:
```bash
export DEEPNOTE_ENABLE_EXECUTION_TIMEOUT=true
export DEEPNOTE_EXECUTION_WARNING_THRESHOLD=5
deepnote-toolkit server
```

Then execute a long-running cell:
```python
import time
time.sleep(10)
```

After 5 seconds, you should see a `LONG_EXECUTION` warning in the logs.

Files Added

  • `deepnote_toolkit/execution_tracking.py`: Execution tracking implementation
  • `deepnote_toolkit/execution_timeout.py`: Timeout monitoring implementation
  • `docs/EXECUTION_TRACKING.md`: Comprehensive documentation

Files Modified

  • `deepnote_toolkit/ipython_utils.py`: Added execution metadata publishing
  • `deepnote_toolkit/runtime_initialization.py`: Integrated tracking and timeout features

Documentation

See `docs/EXECUTION_TRACKING.md` for:

  • Feature overview and architecture
  • Configuration options
  • Debugging procedures
  • Testing instructions
  • Future enhancements

Checklist

  • Code follows project style guidelines
  • Pre-commit hooks pass (flake8, isort)
  • Thread-safe implementation
  • Production-safe (minimal overhead, opt-in features)
  • Comprehensive documentation
  • All PR review issues addressed

Related

Summary by CodeRabbit

  • New Features

    • Added automatic notebook cell execution tracking with duration and status logging.
    • Introduced configurable execution timeout monitoring with warning thresholds and optional auto-interrupt for long-running cells.
    • Enhanced debugging capabilities through environment-based configuration.
  • Documentation

    • Updated README with debugging configuration options.
    • Added comprehensive guide for execution tracking and timeout monitoring.

hc2p added 7 commits November 14, 2025 20:07
- Track execution start/end with timestamps
- Log execution duration and success status
- Publish execution metadata to webapp
- Register IPython event handlers for monitoring
- Monitor long-running executions with configurable thresholds
- Send warnings when executions exceed warning threshold
- Optional auto-interrupt for stuck executions via SIGINT
- Report warnings/timeouts to webapp
- Configurable via environment variables:
  - DEEPNOTE_ENABLE_EXECUTION_TIMEOUT
  - DEEPNOTE_EXECUTION_WARNING_THRESHOLD (default: 240s)
  - DEEPNOTE_EXECUTION_TIMEOUT_THRESHOLD (default: 300s)
  - DEEPNOTE_EXECUTION_AUTO_INTERRUPT (default: false)
- Add publish_execution_metadata() function
- Define DEEPNOTE_EXECUTION_METADATA_MIME_TYPE constant
- Publish structured execution data via display_pub
- Include duration, success status, and error type in metadata
- Import and setup execution tracking during runtime init
- Add optional execution timeout monitor setup
- Configure timeout monitor via environment variables
- Add error handling for both features
- Maintain backward compatibility (timeout monitor disabled by default)
- Document execution tracking and timeout monitoring
- Include configuration examples and environment variables
- Provide debugging guide for stuck executions
- Explain log formats and locations
- Add testing instructions and examples
- List all modified/created files
- Include future enhancement ideas
- Fix LoggerManager usage: use LoggerManager().get_logger() instead of LoggerManager.get_logger()
- Fix webapp URL import: use get_absolute_userpod_api_url() instead of non-existent get_webapp_url()
- All imports and functionality tests now pass
- Fix incorrect timestamp: use time.time() instead of duration in metadata
- Move time imports to module level in execution_timeout.py
- Add threading lock to fix race condition in timeout monitoring
  - Protect current_execution access with lock
  - Copy execution data before processing outside lock
- All fixes validated and tested
@hc2p hc2p requested a review from a team as a code owner November 14, 2025 19:11
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 14, 2025

Caution

Review failed

Failed to post review comments

📝 Walkthrough

Walkthrough

This pull request introduces comprehensive execution monitoring and debugging capabilities for Jupyter notebooks. It adds execution timeout tracking with optional auto-interrupt, execution event logging with metadata publishing, and enhanced debug logging configuration. New modules execution_timeout.py and execution_tracking.py provide event handlers for cell execution lifecycle. These are integrated into runtime initialization and wired to IPython's event system. Configuration driven by environment variables (DEEPNOTE_ENABLE_DEBUG_LOGGING, DEEPNOTE_ENABLE_ZMQ_DEBUG, DEEPNOTE_ENABLE_EXECUTION_TIMEOUT). Jupyter server logging and ZMQ debug settings are conditionally applied. Documentation and README updates explain features and usage.

Sequence Diagram(s)

sequenceDiagram
    participant Cell as Cell Execution
    participant Tracker as ExecutionTracker
    participant Logger as Logger
    participant Webapp as Webapp
    
    rect rgb(200, 220, 255)
    Note over Cell,Webapp: Execution Start
    Cell->>Tracker: pre_execute event
    Tracker->>Tracker: Record start_time,<br/>cell_preview, id
    Tracker->>Logger: Log execution start
    end
    
    rect rgb(240, 240, 240)
    Note over Cell: Cell runs...
    Cell->>Cell: code executes
    end
    
    rect rgb(220, 255, 220)
    Note over Cell,Webapp: Execution Complete
    Cell->>Tracker: post_execute event
    Tracker->>Tracker: Compute duration,<br/>success, error_type
    Tracker->>Logger: Log execution end
    Tracker->>Webapp: publish_execution_metadata
    Webapp-->>Tracker: response
    Tracker->>Tracker: Clear state
    end
Loading
sequenceDiagram
    participant Cell as Cell Execution
    participant Monitor as ExecutionTimeoutMonitor
    participant Timer as Timers
    participant Webapp as Webapp
    
    rect rgb(200, 220, 255)
    Note over Cell,Timer: Execution Start
    Cell->>Monitor: on_pre_execute event
    Monitor->>Timer: Start warning_timer
    Monitor->>Timer: Start timeout_timer<br/>(if enabled)
    Monitor->>Monitor: Record start_time,<br/>code_preview
    end
    
    par Warning Path
        rect rgb(255, 240, 200)
        Note over Monitor,Webapp: Warning Threshold Hit
        Timer->>Monitor: warning_timer expires
        Monitor->>Webapp: POST /execution/warning
        Webapp-->>Monitor: success
        end
    and Timeout Path (if enabled)
        rect rgb(255, 200, 200)
        Note over Monitor,Webapp: Timeout Threshold Hit
        Timer->>Monitor: timeout_timer expires
        Monitor->>Webapp: POST /execution/timeout
        Webapp-->>Monitor: success
        Monitor->>Monitor: Send SIGINT<br/>to process
        end
    end
    
    rect rgb(220, 255, 220)
    Note over Cell,Timer: Execution Completes
    Cell->>Monitor: on_post_execute event
    Monitor->>Timer: Cancel all timers
    Monitor->>Monitor: Clear execution state
    end
Loading

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Title directly matches the PR's main objective: adding execution tracking and timeout monitoring for Jupyter cells, as evidenced by multiple new modules, configuration changes, and documentation.
Docstring Coverage ✅ Passed Docstring coverage is 94.44% which is sufficient. The required threshold is 80.00%.

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

📦 Python package built successfully!

  • Version: 1.1.3.dev11+0aea754
  • Wheel: deepnote_toolkit-1.1.3.dev11+0aea754-py3-none-any.whl
  • Install:
    pip install "deepnote-toolkit @ https://deepnote-staging-runtime-artifactory.s3.amazonaws.com/deepnote-toolkit-packages/1.1.3.dev11%2B0aea754/deepnote_toolkit-1.1.3.dev11%2B0aea754-py3-none-any.whl"

@codecov
Copy link

codecov bot commented Nov 14, 2025

Codecov Report

❌ Patch coverage is 23.63636% with 126 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.35%. Comparing base (153ce69) to head (8063f60).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
deepnote_toolkit/execution_timeout.py 21.34% 70 Missing ⚠️
deepnote_toolkit/execution_tracking.py 26.00% 37 Missing ⚠️
deepnote_toolkit/runtime_initialization.py 16.66% 15 Missing ⚠️
deepnote_toolkit/ipython_utils.py 50.00% 4 Missing ⚠️
Additional details and impacted files
@@                     Coverage Diff                     @@
##           hannes/enhanced-logging      #32      +/-   ##
===========================================================
- Coverage                    72.88%   71.35%   -1.54%     
===========================================================
  Files                           93       95       +2     
  Lines                         5142     5307     +165     
  Branches                       754      765      +11     
===========================================================
+ Hits                          3748     3787      +39     
- Misses                        1150     1276     +126     
  Partials                       244      244              
Flag Coverage Δ
combined 71.35% <23.63%> (-1.54%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@hc2p hc2p changed the base branch from main to hannes/enhanced-logging November 14, 2025 19:14
@deepnote-bot
Copy link

🚀 Review App Deployment Started

📝 Description 🌐 Link / Info
🌍 Review application ra-32
🔑 Sign-in URL Click to sign-in
📊 Application logs View logs
🔄 Actions Click to redeploy
🚀 ArgoCD deployment View deployment
Last deployed 2025-11-14 19:20:15 (UTC)
📜 Deployed commit afabcd53851702ab6ddc42b9bd082e3d772a0221
🛠️ Toolkit version 0aea754

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants