-
Notifications
You must be signed in to change notification settings - Fork 0
Description
We use topoly in AFM-SPM/TopoStats and discovered a regression in performance under Python 3.12 when topoly loads the dictionary that is contained in polvalues.py.
I'm copying over the investigation I made here and have been in contact with @prubach directly about this (via email but also see comments further down in that thread).
We would like TopoStats to support Python 3.12 so anything we can do to help investigate and resolve this matter please let us know.
I've been doing some profiling and have tracked down the cause of this hang to be a module from Topoly.
The attached profile was obtained by running...
git switch 4b7d25813
python -m cProfile -o prof/test_run_filters_cProfile_hangs.prof $(which pytest) tests/test_processing.py::test_run_filtersIt can then be visualised with the following (you don't have to run the profiling yourself you can just download the attached file and run the following after installing snakeviz...
# Optionally install snakeviz
pip install snakeviz
# Uncompress the file
gunzip test_run_filters_cProfile_hangs.prof.gz
# Run snakeviz on the file
snakeviz path/to/test_run_filters_cProfile_hangs.profA browser tab will open and at the top it shows you that the module where most time is spent is polyvalues.py. Only a single call is made to this module...

The second box from the left (may differ on your view) with no "icicles" is the polyvalues.py module and it doesn't actually call or run any processes itself, it just takes a long time to do....something?!?!?!
test_run_filters_cProfile_hangs.prof.gz
If I repeat this on the previous commit (git switch 560d0b19d and repeat the profiling) there is no such problem, no call is made to polyvalues.py.
To check I've compared the two commits shows this appears to be the first point at which topoly modules are imported, we could use -G topoly to search the differences when checked out on the commit that fails and the previous commit but this exceeds DFT_GRAPH_LIMIT so I have gone with pipping it into grep (NB this is the output from difftastic and shows a line has been added to HEAD (at line 11) that isn't in the previous commit (HEAD~1), I limit the output to the first 20 lines by piping to head -n 20)...
❱ git diff HEAD HEAD~1 | grep topoly -A3 -B3
❱ git diff HEAD HEAD~1 | grep topoly -A3 -B3
555- 8 8 import numpy.typing as npt
556- 9 9 import pandas as pd
557-10 10 from skimage.morphology import label
558:11 .. from topoly import jones, translate_code
559-12 11
560-13 12 from topostats.logs.logs import LOGGER_NAME
561-14 13 from topostats.tracing.tracingfuncs import coord_dist, genTracingFuncs, order_branch, reorderTrace
--
847-349 if len(nxyz) > 2 and end_to_end_dist_square ...
848-... d <= 2: # pylint: disable=chained-comparison ...
849-350 # single coord traces mean nxyz[0]==[1] ...
850:... so cause issues when duplicating for topoly ...
851-351 nxyz = np.append(nxyz, nxyz[0][np.newax ...
852-... is, :], axis=0) ...
853-352 simple_coords.append(nxyz) ... I'm not sure why this is even being called though as I though topoly has only been introduced to undertake work in DNA tracing and shouldn't affect the very early stage of filters? Certainly none of the files in topostats/filters.py have been touched...
git diff --name-only HEAD HEAD~1
topostats/processing.py
topostats/tracing/ordered_tracing.py
topostats/tracing/splining.pyAlthough perhaps my understanding is wrong and everything is getting initalised for some reason. 🤔
@MaxGamill-Sheffield @SylviaWhittle @llwiggins : do you observe this long hang when running the tests? Do you get similar profiling?
