Different parts of this repo need different versions of python #316

klaragerlei · 2021-09-20T11:46:30Z

Is your feature request related to a problem? Please describe.
The pipeline uses python 3.6 and the shuffled analysis uses 3.8, so the data frame outputs of these two are not compatible, because pyhton 3.6 cannot open 3.8 pickles. This problem can be managed by having multiple virtual environments on Eleanor.

Describe the solution you'd like
Update the pipeline to use 3.8

Describe alternatives you've considered
Keep using the workaround. I think this will cause a lot of issues for less experienced users.

4iar · 2021-09-21T09:02:06Z

Could this be solved by specifying the max pickler-protocol in the shuffled-analysis code, so that it saves dataframes that are backwards compatible with the 3.6 pipeline?

e.g. df.to_pickle('cat.pkl', protocol=4)

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_pickle.html

Protocol version 4 was added in Python 3.4. It adds support for very large objects, pickling more kinds of objects, and some data format optimizations. It is the default protocol starting with Python 3.8. Refer to PEP 3154 for information about improvements brought by protocol 4.

Protocol version 5 was added in Python 3.8. It adds support for out-of-band data and speedup for in-band data. Refer to PEP 574 for information about improvements brought by protocol 5.

From: https://docs.python.org/3/library/pickle.html

This would only affect newly saved dataframes but you could write a quick 3.8 script to glob, load, and re-save your dataframes using protocol 4

(Python 3.8 does have the walrus operator so it would be nice to upgrade someday anyway...)

:=

klaragerlei · 2021-09-21T09:40:23Z

df.to_pickle('cat.pkl', protocol=4)

I like this idea. @HDClark94 , is there any reason for using protocol 5, or would it be okay to change this?

klaragerlei added the feature request Enhancement or feature request label Sep 20, 2021

klaragerlei self-assigned this Sep 20, 2021

klaragerlei changed the title ~~Different part of this repo need different versions of python~~ Different parts of this repo need different versions of python Sep 20, 2021

klaragerlei added the priority-medium Medium priority task label Sep 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different parts of this repo need different versions of python #316

Different parts of this repo need different versions of python #316

klaragerlei commented Sep 20, 2021

4iar commented Sep 21, 2021

klaragerlei commented Sep 21, 2021

Different parts of this repo need different versions of python #316

Different parts of this repo need different versions of python #316

Comments

klaragerlei commented Sep 20, 2021

4iar commented Sep 21, 2021

klaragerlei commented Sep 21, 2021