Discussion: run data streaming outside notebook environment #42

Tom-Willemsen · 2022-02-24T09:24:41Z

We are running into an increasing number of issues/complications with running the data streaming code within a notebook environment, for example:

Needing to use asyncio to not block the notebook entirely
Leaking threads/asyncio tasks when re-running jupyter cells
Re-running the cell containing the plot can cause various exceptions if this occurs while one of the data streaming tasks is updating it (race condition)

There are also potential complications with having data streaming in a notebook from a data analysis perspective, where we would want to re-stream reduced data for consumption by analysis programs. We feel that having this in a notebook is error-prone if a user changes the notebook mid-stream.

While these issues may all be fixable, it feels like we are not using notebooks "as intended" here and therefore are exposing ourselves to more bugs/complications than necessary.

I think we should discuss alternative approaches for running the data streaming infrastructure outside a notebook environment, for example running the stream listener as a standalone background python task, and having the data streaming plot be a matplotlib plot outside of a notebook environment.

SimonHeybrock · 2022-02-28T09:06:42Z

Is the key difficulty here that we cannot use a context manager in a notebook in a convenient manner?

SimonHeybrock · 2022-03-01T14:10:18Z

I think we should discuss alternative approaches for running the data streaming infrastructure outside a notebook environment, for example running the stream listener as a standalone background python task, and having the data streaming plot be a matplotlib plot outside of a notebook environment.

Is there anything in the current implementation that prevents this? That is, is it either-or?

Tom-Willemsen · 2022-03-01T19:03:43Z

Is the key difficulty here that we cannot use a context manager in a notebook in a convenient manner?

I think a context manager would only help if the code inside the context manager was blocking? If it's an asyncio call then it wouldn't help as we still wouldn't have a way to close the old threads/asyncio tasks when the cell gets re-run.

Running blocking code in the notebook I feel is not the right approach - even if the issues with plot interactivity could be fixed, there are other issues when trying to re-run cells containing blocking code (need to explicitly break the interpreter, wait for it to timeout, then re-run the cell).

Is there anything in the current implementation that prevents this? That is, is it either-or?

I think there's probably not anything specific in the implementation that prevents this, beyond the need to produce, test and document an "alternative" approach (if that's what we decide we want).

I'm not currently convinced that the overhead of maintaining both solutions would be worth it, but happy to have my opinion changed on this - what do you see as the advantage of the current solution which we couldn't reproduce in some alternative (e.g. standalone) solution? I guess maybe scientist familiarity with the notebook environment?

SimonHeybrock · 2022-03-02T07:26:51Z

If it's an asyncio call then it wouldn't help as we still wouldn't have a way to close the old threads/asyncio tasks when the cell gets re-run.

Wouldn't the context manager's __exit__ take care of that?

what do you see as the advantage of the current solution

I do not know enough about the current state... is there an implemented solution, apart from just something that shows how this is possible in a notebook?

which we couldn't reproduce in some alternative (e.g. standalone) solution?

Not having to write a custom application. But I do not have enough information to tell whether this is really simpler with Jupyter plus, e.g., Voila.

nvaytet · 2022-03-03T07:09:13Z

One thing that currently only works in a notebook is the instrument view, so if live streaming into and instrument view is a must have (I don't know if it is, maybe it's not the most useful visualization to have), then we still have to do things in a notebook, or at least voila.

Tom-Willemsen · 2022-03-09T11:25:14Z

Wouldn't the context manager's exit take care of that?

I'm not sure how this helps. If the task is blocking, then the context managers' __exit__ will never be called as the task gets forcibly terminated by jupyter if it's still running after a timeout, I believe. If the task is non-blocking, then the __exit__ would be called immediately?

There may be some hook in jupyter/ipython where we can listen for a "stop" event, but I didn't find one yet...

I do not know enough about the current state... is there an implemented solution, apart from just something that shows how this is possible in a notebook?

There is some code that displays data-streaming specific widgets in a notebook environment, for example. Parts or all of that might need to be rewritten if we decided to use a different solution. The plotting code should in principle be runnable outside a notebook, but is likely to need tweaking as it's only ever been tested in notebooks. But other than that, I'd say most of the underlying code is independent of running in a notebook or not.

SimonHeybrock · 2022-05-06T09:45:50Z

Partially related to this discussion, the consensus is that the current requirements for data streaming are too fuzzy and maybe too ambitious. It appears to stop us from make actual progress. Therefore:

Aim for a minimal working but useful solution.
- Must get away from the "we have a working prototype" situation as soon as possible.
Should be trivial to launch, e.g., based on config file, without complicated setup.
- Integration tests to ensure this keeps working over the coming years.
Minimal features:
- Small number of pre-configured live-updating plots (instrument view, normal plots).

SimonHeybrock mentioned this issue Jun 21, 2022

Dashboard / live data considerations scipp/scippneutron#341

Closed

SimonHeybrock transferred this issue from scipp/scippneutron Apr 18, 2023

SimonHeybrock removed this from Development Board Apr 18, 2023

SimonHeybrock closed this as completed Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion: run data streaming outside notebook environment #42

Discussion: run data streaming outside notebook environment #42

Tom-Willemsen commented Feb 24, 2022

SimonHeybrock commented Feb 28, 2022

SimonHeybrock commented Mar 1, 2022

Tom-Willemsen commented Mar 1, 2022

SimonHeybrock commented Mar 2, 2022

nvaytet commented Mar 3, 2022

Tom-Willemsen commented Mar 9, 2022

SimonHeybrock commented May 6, 2022

Discussion: run data streaming outside notebook environment #42

Discussion: run data streaming outside notebook environment #42

Comments

Tom-Willemsen commented Feb 24, 2022

SimonHeybrock commented Feb 28, 2022

SimonHeybrock commented Mar 1, 2022

Tom-Willemsen commented Mar 1, 2022

SimonHeybrock commented Mar 2, 2022

nvaytet commented Mar 3, 2022

Tom-Willemsen commented Mar 9, 2022

SimonHeybrock commented May 6, 2022