-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a Dockerfile container description #31
base: master
Are you sure you want to change the base?
Conversation
Many of the python machine-learning libraries are under fairly rapid development, and I had trouble getting coptic-nlp.py to run on my system. Having explored what versions are functional, document an working configuration in a linux container build description. My basic approach was to look for versions contemporary to the V3.0.0 tag. - Ubuntu 18.04 was the current LTS release at the time. This has an old enough packaged python (3.6.9) that the scripts work without warning. I also used ubuntu packages for the more stable python libraries. - Manually install scikit-learn since I had trouble with installing directly from requirements.txt. Note this uses 0.19.0 based on the version there and in the README, but when running sklearn warns that LabelEncoder was pickled from version 0.20.2. - Manually install xgboost 0.82 as a roughly contemporary version. - The current releases of joblib (1.0.1) and depedit (3.2.0.0) seem to work fine. - Install and symlink the Ubuntu packaged foma - Download and unpack the maltparser-1.8 With these steps the included tests all pass.
This is the most recent release wich declares itself compatible with the in-tree pickle models. To use a more recent version, these need to be migrated to the portable `save_model` api.
This is the release expected by the in-tree pickled modules, silencing a startup warning.
Documentation and `requirements.txt` specify scikit-learn version 0.19.0, but with this version the program warns on startup that the LabelEncoder module was pickled with the more-recent 0.20.2 version. Update references to this specific version to silence the warning. Also specify xgboost 0.90, which is the most-recent release able to load the in-tree models.
@rillian thanks for setting this up - could I ask you to close this and open a PR against the dev branch instead? The Docker support will be great to add for a bundled release with the current architecture, though I should mention we are working to remove the Java dependencies and do tagging and parsing completely in Python, which should simplify some of the installation. At that point we should update the docker file as well. |
Thanks for taking a look. If the general idea is useful, why not merge these changes into the default branch? It would be helpful to users until the next major release. I'll look at doing something similar for the |
Active development is being done in the dev branch - if we merge this into master then there will be a conflict with dev once we are ready to pull experimental features into the stable master branch. We basically only commit to dev and things land in master via PR from dev when we feel something is useful and sufficiently stable/cleaned up. Thanks again for contributing! |
I had a lot of trouble getting the
coptic_nlp.py
to run on my systems. Here's an environment which worked to for me. Hopefully it will help others wanting to run the tool locally.