Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different parts of this repo need different versions of python #316

Open
klaragerlei opened this issue Sep 20, 2021 · 2 comments
Open

Different parts of this repo need different versions of python #316

klaragerlei opened this issue Sep 20, 2021 · 2 comments
Assignees
Labels
feature request Enhancement or feature request priority-medium Medium priority task

Comments

@klaragerlei
Copy link
Collaborator

Is your feature request related to a problem? Please describe.
The pipeline uses python 3.6 and the shuffled analysis uses 3.8, so the data frame outputs of these two are not compatible, because pyhton 3.6 cannot open 3.8 pickles. This problem can be managed by having multiple virtual environments on Eleanor.

Describe the solution you'd like
Update the pipeline to use 3.8

Describe alternatives you've considered
Keep using the workaround. I think this will cause a lot of issues for less experienced users.

@klaragerlei klaragerlei added the feature request Enhancement or feature request label Sep 20, 2021
@klaragerlei klaragerlei self-assigned this Sep 20, 2021
@klaragerlei klaragerlei changed the title Different part of this repo need different versions of python Different parts of this repo need different versions of python Sep 20, 2021
@klaragerlei klaragerlei added the priority-medium Medium priority task label Sep 20, 2021
@4iar
Copy link
Collaborator

4iar commented Sep 21, 2021

Could this be solved by specifying the max pickler-protocol in the shuffled-analysis code, so that it saves dataframes that are backwards compatible with the 3.6 pipeline?

e.g. df.to_pickle('cat.pkl', protocol=4)

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_pickle.html

Protocol version 4 was added in Python 3.4. It adds support for very large objects, pickling more kinds of objects, and some data format optimizations. It is the default protocol starting with Python 3.8. Refer to PEP 3154 for information about improvements brought by protocol 4.

Protocol version 5 was added in Python 3.8. It adds support for out-of-band data and speedup for in-band data. Refer to PEP 574 for information about improvements brought by protocol 5.

From: https://docs.python.org/3/library/pickle.html

This would only affect newly saved dataframes but you could write a quick 3.8 script to glob, load, and re-save your dataframes using protocol 4

(Python 3.8 does have the walrus operator so it would be nice to upgrade someday anyway...)

:=

@klaragerlei
Copy link
Collaborator Author

df.to_pickle('cat.pkl', protocol=4)

I like this idea. @HDClark94 , is there any reason for using protocol 5, or would it be okay to change this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Enhancement or feature request priority-medium Medium priority task
Projects
None yet
Development

No branches or pull requests

2 participants