Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Dockerfile container description #31

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

rillian
Copy link

@rillian rillian commented Aug 31, 2021

I had a lot of trouble getting the coptic_nlp.py to run on my systems. Here's an environment which worked to for me. Hopefully it will help others wanting to run the tool locally.

Many of the python machine-learning libraries are under fairly
rapid development, and I had trouble getting coptic-nlp.py to
run on my system.

Having explored what versions are functional, document an working
configuration in a linux container build description. My basic
approach was to look for versions contemporary to the V3.0.0 tag.

- Ubuntu 18.04 was the current LTS release at the time. This has
  an old enough packaged python (3.6.9) that the scripts work
  without warning. I also used ubuntu packages for the more stable
  python libraries.

- Manually install scikit-learn since I had trouble with installing
  directly from requirements.txt. Note this uses 0.19.0 based on
  the version there and in the README, but when running sklearn
  warns that LabelEncoder was pickled from version 0.20.2.

- Manually install xgboost 0.82 as a roughly contemporary version.

- The current releases of joblib (1.0.1) and depedit (3.2.0.0)
  seem to work fine.

- Install and symlink the Ubuntu packaged foma
- Download and unpack the maltparser-1.8

With these steps the included tests all pass.
This is the most recent release wich declares itself compatible
with the in-tree pickle models. To use a more recent version,
these need to be migrated to the portable `save_model` api.
This is the release expected by the in-tree pickled modules,
silencing a startup warning.
Documentation and `requirements.txt` specify scikit-learn version
0.19.0, but with this version the program warns on startup that
the LabelEncoder module was pickled with the more-recent 0.20.2
version. Update references to this specific version to silence
the warning.

Also specify xgboost 0.90, which is the most-recent release
able to load the in-tree models.
@amir-zeldes
Copy link
Member

@rillian thanks for setting this up - could I ask you to close this and open a PR against the dev branch instead?

The Docker support will be great to add for a bundled release with the current architecture, though I should mention we are working to remove the Java dependencies and do tagging and parsing completely in Python, which should simplify some of the installation. At that point we should update the docker file as well.

@rillian
Copy link
Author

rillian commented Sep 21, 2021

Thanks for taking a look. If the general idea is useful, why not merge these changes into the default branch? It would be helpful to users until the next major release.

I'll look at doing something similar for the dev branch, but it will probably be a while.

@amir-zeldes
Copy link
Member

why not merge these changes into the default branch

Active development is being done in the dev branch - if we merge this into master then there will be a conflict with dev once we are ready to pull experimental features into the stable master branch. We basically only commit to dev and things land in master via PR from dev when we feel something is useful and sufficiently stable/cleaned up. Thanks again for contributing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants