Skip to content

anndata_dask_array.ipynb: copy tweaks / improvements #19

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ryan-williams
Copy link

@ryan-williams ryan-williams commented Oct 30, 2024

anndata_dask_array.ipynb

  • Seed da.random/np.random ⟹ allow regenerating the notebook deterministically.
  • Copy tweaks: some typos, some more opinionated rephrasing, couple places described things that didn't match the cell outputs (e.g. claiming a slice result was eager when it wasn't), couple places I added a cell and some copy.
  • Kernel name: python3.bakpython3

Dockerfile / regenerate.sh

Regenerate notebook files deterministically (in Docker / using juq to clean notebooks / canonicalize outputs), e.g.:

./regenerate.sh anndata_dask_array

I've used it here on anndata_dask_array.ipynb, but not any other notebooks.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@ryan-williams ryan-williams force-pushed the copy branch 2 times, most recently from c400939 to 1eafd86 Compare October 30, 2024 21:31
@flying-sheep
Copy link
Member

Hi, thank you very much, this is a great idea. We really need to standardize a reproducibility story here, thank you for getting started here. I don‘t think pip install will install the same versions without pip-timemachine or a lockfile, so this is not completely reproducible. It would be amazing though if we found a way to get pixel-perfect graphics that don‘t generate a diff every time.

One thing about your solution: The font rendering is kind of ugly. Can you check if installing ipycytoscape instead of graphviz (and maybe also if setting dask’s config visualization.engine to it) improves this?

@flying-sheep
Copy link
Member

Hia, I don’t know if you saw my comments above: Could you please take a look?

- build a Python 3.11.8 / Ubuntu Docker image
- mount in `$PWD`
- execute notebook(s)
- clean results (remove execution/timing metadata, canonicalize outputs)
@ryan-williams
Copy link
Author

ryan-williams commented Feb 7, 2025

Sorry, I finally got back to this.

One thing about your solution: The font rendering is kind of ugly. Can you check if installing ipycytoscape instead of graphviz (and maybe also if setting dask’s config visualization.engine to it) improves this?

ipycytoscape embeds "widgets" in the notebook, which render as placeholder text when I reload the notebook:

ipycytoscape widget text output

I'm assuming they also won't render when compiled into a docsite either. lmk if I missed something there.

On the Graphviz side, Dask's to_graphviz hard-codes fontname="helvetica", but it renders differently for me on macOS vs. Ubuntu (and slightly differently still in an Ubuntu Docker image on an Ubuntu host, which I used in this PR). The existing images seem to match what I see when I run it on my Macbook, and both Ubuntu versions have "uglier" fonts. Here are samples:

macOS

mac

Ubuntu Docker (this PR)

dkr

Ubuntu

ubuntu

Let me know how you want to proceed. I was also thinking I should add an example of reading from Dask (based on Scanpy's Dask tutorial, that @ivirshup pointed me at)

"One-liner" I used for extracting images (for reference, incl. my own)
r=copy  # Git ref
f=anndata_dask_array.ipynb  # notebook path
git show $r:$f \
| jq -r '.cells[] | (.outputs // [])[].data.["image/png"] | select(.)' \
| head -n1 \
| base64 -d \
> $r.png

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants