PyPLN is a distributed pipeline for natural language processing, made in Python. We use NLTK and ZeroMQ as our foundations. The goal of the project is to create an easy way to use NLTK for processing big corpora, with a Web interface.
We don't have a production release yet, but it's scheduled on our next milestone.
PyPLN is sponsored by Fundação Getulio Vargas.
PyPLN is free software, released under the GPLv3 https://gnu.org/licenses/gpl-3.0.html.
Our documentation is hosted using GitHub Pages:
- PyPLN Documentation (created using Sphinx)
- Code reference (created using epydoc)
You will need some Python packages, libmagic and poppler utils
To install dependencies (on a Debian-like GNU/Linux distribution):
sudo apt-get install python-setuptools libmagic-dev poppler-utils pip install virtualenv virtualenvwrapper mkvirtualenv pypln.backend pip install -r requirements/production.txt
You will also need to install NLTK data. You can do so following the NLTK documentation.
To run tests:
workon pypln.backend pip install -r requirements/development.txt make test
See our code guidelines.